Reliable stabilization of n-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns

ABSTRACT

A chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide is disclosed, as are a method of enhancing folded stabilization and a pharmaceutical composition of the glycosylated chimer. The pre-existing and chimeric polypeptides have substantially the same length, substantially the same amino acid residue sequence, and exhibit at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7 Å of each other. The chimeric therapeutic polypeptide has the sequon Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:001) within that tight turn sequence such that the side chains of the Aro, Asn and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7 Å of each other. That sequon is absent from the pre-existing therapeutic polypeptide.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority from provisional application No. 61/514,202, filed Aug. 2, 2011 whose disclosures are incorporated herein by reference.

GOVERNMENTAL SUPPORT

The present invention was made with governmental support from National Institutes of Health grant GM051105 and NRSA NIH post-doctoral fellowship F32 GM086039. The US government has certain rights in the invention.

BACKGROUND ART

Nearly one-third of the eukaryotic proteome traverses the cellular secretory pathway [Imperiali, Acc. Chem. Res. 30, 452-459 (1997)]. Many of these proteins are co-translationally N-glycosylated at Asn residues within the conserved Asn¹-Yyy²-Thr³/Ser³ sequon, where Yyy is any amino acid residue other than proline and is located at position 2 between an asparagine (Asn) at the amino-terminal end of the sequon and a threonine or serine (Thr/Ser) at the carboxy-terminal end of the sequon. N-Glycosylation can increase the stability of proteins, however the molecular basis for this is enhanced stability is incompletely understood.

As the ribosome inserts polypeptide chains into the endoplasmic reticulum (ER), the enzyme oligosaccharyl transferase (OST) attaches the highly conserved Glc₃Man₉GlcNAc₂ (where Glc is glucose, Man is mannose, and GlcNAc is N-acetylglucosamine) glycan (oligosaccharide) en bloc to the N atom of the Asn side chain in a subset of Asn-Xxx-Thr/Ser sequons [Kornfeld et al., Annu Rev Biochem 54, 631-664 (1985); and Kelleher et al., Glycobiology 16:47 R-62R (2006)]. N-linked glycans have important extrinsic effects on folding in the ER by allowing glycoproteins to enter the calnexin/calreticulin (CNX/CRT) folding/degradation pathway [Molinari, Nat Chem Biol 3, 313-320 (2007); Helenius et al., Science 291, 2364-2369 (2001)]. N-glycans can also have intrinsic effects on protein folding by enhancing protein folding efficiency in cells, even when the CNX/CRT pathway is absent [Banerjee et al., Proc Natl Acad Sci U S A 104, 11676-11681 (2007); Trombetta, Glycobiology 13, 77R-91R (2003)] or when the N-glycan does not allow CNX/CRT interactions [Stanley et al., FASEB J 9, 1436-1444 (1995)], consistent with reports that N-glycans stabilize protein structure, accelerate folding, and reduce aggregation in vitro [Wormald et al., Structure with Folding & Design 7, R155-R160 (1999); Jitsuhara et al., J Biochem 132, 803-811 (2002); Mitra et al., Trends in Biochemical Sciences 31, 156-163 (2006)].

The increased use of protein therapeutics has made issues such as stabilized polypeptide structure, accelerated folding, and reduced aggregation of paramount importance to the pharmaceutical industry [Li et al., Curr Opin Biotechnol 20, 678-684 (2009); Sinclair et al., J Pharm Sci 94, 1626-1635 (2005); Sola et al., BioDrugs 24, 9-21 (2010); Walsh et al., Nat Biotechnol 24, 1241-1252 (2006)]. The therapeutic benefits of N-glycosylation are exemplified in darbepoetin alfa (an erythropoietin variant with two additional N-glycans) [Egrie et al., Exp Hematol 31, 290-299 (2003), interferon β [Runkel et al., Pharm Res 15, 641-649 (1998)], and follicle stimulating hormone [Perlman et al., J Clin Endocrinol Metab 88, 3227-3235 (2003)].

A number of types of tight turns within secondary protein or polypeptide sequences have described in the literature. These structures are referred to as a δ-turn that encompasses two amino acid residues, a γ-turn that involves three residues, a β-turn that involves four amino acid residues, an α-turn that involves five residues and a π-turn that involves six residues. [Chou, Anal Biochem 286, 1-16 (2000).]

A β-turn or reverse turn contains a sequence of four consecutive amino acid residues that are designated i, i+1, i+2 and i+3, in the direction from N-terminus toward C-terminus of the polypeptide. The five residues of an α-turn are designated i, i+1, i+2, i+3 and i+4. Most, but not all reverse turns and α-turns contain a hydrogen bond between the first and fourth or first and fifth residues, respectively, in which the residue designated i contains a peptide bond (peptidyl) carbonyl group (>C═O), whereas the fourth residue, i+3, or the fifth residue, i+4, contains the peptidyl —NH— group whose hydrogen is hydrogen-bonded to the carbonyl oxygen of the i residue. Residues bonded to the amino group of the i residue (toward the amino-terminus from the i residue) are designated i−1, i−2, i−3, etc.

Another way to define a reverse turn and an α-turn motif is by the close approach, less than 7 Å, of C^(α) atoms (alpha-carbon atoms) of the residues of the motif. Thus, one can define a β-turn and an α-turn by the close approach of C^(a) atoms of residues I and i+3 or i and i+4, respectively. [Chou, Anal Biochem 286, 1-16 (2000).] This distance implies a particular geometry of the corresponding backbone, which turns back on itself or, more generally, that corresponds to a change of direction, and that the residue side chains are on the same side of the backbone chain.

The β-turns are usually described as orienting structure because they orient α-helices, and β-sheets, indirectly defining the topology of proteins. They are one of the most abundant secondary structures.

Several types of reverse turns have been identified and are designated types I, I′, II, III, IV, V and VI. Types I and II are the most common reverse turns, the essential difference between them being the orientation of the peptide bond between residues at i+1 and i+2. The i+2 residue of the type II turn can substantially only be occupied by glycine because of steric interference of the carbonyl group of the i+1 residue.

It was recently shown that naturally occurring N-glycosylation at a single Asn residue comprising a reverse turn within the adhesion domain of human glycoprotein CD2 (HsCD2ad) stabilizes the protein by −3.1 kcal mol⁻¹, makes folding four times faster, and makes unfolding 50 times slower in vitro [Hanson et al., Proc Natl Acad Sci U S A 106, 3131-3136 (2009)]. However, introducing N-glycans into proteins that are not normally glycosylated (naïve proteins) has previously rarely led to substantially improved folding energetics [Hackenberger et al., J Am Chem Soc 127, 12882-12889 (2005); Wang et al., Biochemistry 35, 7299-7307 (1996); Elliott et al., J Biol Chem 279, 16854-16862 (2004)].

The present inventors and co-workers recently showed that glycosylation of an Asn residue within the sequence Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser, where Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn is asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine, stabilizes the glycosylation-naïve rat CD2 adhesion domain (RnCD2ad) and human muscle acylphosphatase (AcyP2) by about −2 kcal mol⁻¹, provided that Asn is located at the i+2 position of a type I β-turn with a G1 β-bulge using the terminology of Sibanda et al., J Mol Biol 206(4), 759-777 (1989); Richardson, Adv Protein Chem 34, 167-339 (1981), hereafter called a type I β-bulge turn [Culyba et al., Science 331, 571-575 (2011); Application Ser. No. 61/380,967, filed 8 Sep. 2010].

Published structural data [Wyss et al., Science 269, 1273-1278 (1995)] from the human ortholog of RnCD2ad (HsCD2ad, FIG. 1A) suggest that placement of an N-glycan at i+2 in the type I β-bulge turn context permits the α-face of GlcNAc1 of the N-glycan to engage in stabilizing hydrophobic interactions with the aromatic ring of Phe at the i position, and the side-chain methyl group of Thr at the i+4 position {a stabilizing C—H/n interaction may also play a role [Laughrey et al., J Am Chem Soc 130(44), 14625-14633 (2008)]}.

Thus, it is hypothesized that the substantial energetic benefits of glycosylating a protein such as HsCD2ad depend on both the reverse turn context of the glycosylation site and the surrounding amino acid sequence. Some results showing the correctness of this hypothesis as applied to therapeutic polypeptides are shown and discussed hereinafter.

BRIEF DESCRIPTION OF THE INVENTION

A chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide is contemplated. Such a therapeutic chimeric polypeptide is often present in isolated and purified form.

The pre-existing therapeutic polypeptide has a length of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300, amino acid residues, and exhibits a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7 Å of each other. The pre-existing therapeutic polypeptide lacks the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser [SEQ ID NO:001], within that sequence of four to about seven amino acid residues. In that sequon, Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn is asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine. Except for the four to about seven residues within the tight turn, a contemplated chimeric therapeutic polypeptide has the same length, at least one tight turn and substantially the same amino acid residue sequence as the pre-existing therapeutic polypeptide. The two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:001) as defined above. That sequon is located at the same position in the tight turn as the sequence of four to about seven amino acid residues such that the side chains of the Aro, Asn and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7 Å of each other. In one preferred embodiment, “n” is 1 and “p” is 1 and the chimeric polypeptide contains a Type II β-turn in a six-residue loop.

In another preferred embodiment, “n” is 1 and “p” is zero. The two polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Xxx-Asn-Yyy-Thr/Ser (SEQ ID NO:002) as defined above. The chimeric polypeptide preferably contains a five-residue type I β-bulge turn.

In still another preferred embodiment, “n” is zero and “p” is zero. The two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Asn-Yyy-Thr/Ser (SEQ ID NO:003) as defined above. Here, a preferred chimeric polypeptide contains a four-residue type I′ β-turn.

In preferred practice, when the sequon is glycosylated, the therapeutic chimeric polypeptide exhibits a folding stabilization enhancement by about −0.5 to about −4 kcal/mol compared to the before-mentioned pre-existing therapeutic polypeptide in non-glycosylated form.

It is to be understood that substantially any and every therapeutic polypeptide that contains a tight turn in its secondary structure is contemplated herein. For example, substantially all of the Fc portions of human IgG antibodies contain one or two tight turn sequences to which the present invention can be applied. One of those sequences is often glycosylated, whereas the other is not glycosylated.

In some preferred embodiments, the sequon has the sequence, in the direction from left to right and from N-terminus to C-terminus, -Lys-(Zzz)m-Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:004), wherein m is zero, 1, 21, or 3, n is 0, 1, 2, 3, or 4, p is zero or one, and Lys is lysine, and Zzz, Aro, Xxx, Asn, Yyy and Thr/Ser are as defined above.

A method of enhancing folded stabilization of a therapeutic polypeptide is also contemplated. A contemplated therapeutic polypeptide has a sequence of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300, amino acid residues, and exhibits a secondary structure that comprises at least one tight turn in which the side chains of two residues in a preferably glycosylation-free sequence of four to about seven amino acid residues within the tight turn project on the same side of the turn and are within less than about 7 Å of each other. In accordance with the method, a therapeutic chimeric polypeptide is prepared. That therapeutic chimeric polypeptide has the same length and substantially same amino acid sequence as the therapeutic polypeptide, and exhibits a secondary structure containing at least one tight turn at the same sequence position within the tight turn of the therapeutic polypeptide except that the sequence of preferably glycosylation-free four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro-(Xxx)n-(Zzz)p -Asn(Glycan)-Yyy-Thr/Ser (SEQ ID NO:005), wherein Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn(Glycan) is glycosylated asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine, and the side chains of the Aro, Asn(Glycan) and Thr/Ser amino acid residues project on the same side of the tight turn and are within less than about 7 Å of each other.

In some embodiments, a therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of the therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid residue sequence Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:001) when present in a polypeptide sequence expressed therein to form a polypeptide containing the amino acid residue sequence Aro-(Xxx)n-(Zzz)p-Asn(Glycan)-Yyy-Thr/Ser (SEQ ID NO:005). In other embodiments, a therapeutic chimeric polypeptide is prepared by in vitro peptide synthesis.

Another embodiment is a pharmaceutical composition that comprises an effective amount of a before-discussed chimeric therapeutic polypeptide dissolved or dispersed in a pharmaceutically acceptable diluent composition. That pharmaceutical composition typically also contains water, at least when administered.

The present invention has several benefits and advantages. One benefit is that a therapeutic polypeptide whose folding is thermodynamically more stable by the preparation of glycosylated chimer whose amino acid residue sequence is almost identical to that of the therapeutic polypeptide.

An advantage of the invention is that the preparation of a glycosylated chimeric therapeutic polypeptide is readily accomplished.

Still further benefits and advantages will be apparent to those of skill in the art from the disclosures that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings forming a portion of this disclosure,

FIG. 1 in four parts illustrates that matching enhanced aromatic sequons with reverse turn hosts that can facilitate stabilizing interactions among Phe, Asn(GlcNAc1), and Thr. FIG. 1A shows a space-filling model of the Phe63-Asn65-GlcNAc-Thr67 interaction of a glycosylated five-residue type I β-bulge turn from the adhesion domain of the human protein CD2 [PDB accession code: 1GYA; Wyss et al., Science 269, 1273-1278 (1995)]; FIG. 1B illustrates a Type II β-turn in a six-residue loop [PDB accession code: 1PIN; Ranganathan et al., Cell 89, 875-886 (1997)]; FIG. 10 shows a five-residue type I β-bulge turn [PDB accession code: 2F21; Jäger et al., Proc. Natl. Acad. Sci. USA 103, 10648-10653 (2006)]; and FIG. 1D illustrates a four-residue type I′ β-turn [PDB accession code: 1ZCN; Jäger et al., Proc. Natl. Acad. Sci. USA 103, 10648-10653 (2006)]. FIGS. 1B-1D are from variants of the WW domain of human protein Pin1 having incorporated components of the enhanced aromatic sequon. Structures are rendered in PyMOL (a user-sponsored molecular visualization system on an open-source foundation) with dotted lines depicting hydrogen bonds. Interatomic distances between the side-chain beta carbons (Cβ's) in Å are depicted.

FIG. 2 in six parts shows in FIG. 2A that residues 63-67 of the RnCD2ad retain the same five-residue type I β-bulge turn geometry found in HsCD2ad but RnCD2ad does not require N-glycosylation to fold; FIGS. 2B and 2C show stabilities and folding kinetics of the eight RnCD2* sequences required for the thermodynamic cycle were determined by equilibrium denaturation and stopped-flow kinetic studies; FIG. 2D is a western blot showing that the relative ratio of N-glycosylated to non-glycosylated polypeptides from Sf9 insect cells is substantially higher for a RnCD2* variant having a Phe residue in the tight turn relative to a variant that lacks the Phe residue; tabulated data are shown in FIG. 2E (N refers to N-glycosylated Asn); and FIG. 2F illustrates contact of the Phe and Thr side chains with the first GlcNAc, of the N-glycan of four polypeptides found in a PDB search of proteins that contain type I β-bulge turns with a Phe at the i position, a glycosylated Asn residue at the i+2 position, and a Thr at the i+4.

FIG. 3 in four parts illustrates in FIG. 3A that the Thr43Phe (i) and Lys45Asn (i+2) mutations in the β-bulge turn human muscle acylphosphatase (AcyP2) create an enhanced aromatic sequon in that the i+4 position is already Thr; FIG. 3B shows data from a equilibrium denaturation study for determining folding free energy; FIG. 3C illustrates the sequences at positions 41-47 of four AcyP2* variants, AcyP2*(SEQ ID NO:174); g-AcyP2* (SEQ ID NO:175); AcyP2*-F (SEQ ID NO:176); g-AcyP2*-F (SEQ ID NO:177), that differ in the identity of the side chain at position 43 (Phe or Thr) and in the presence or absence of a glycan at Asn45; and FIG. 3D is a western blot showing that the relative ratio of N-glycosylated to non-glycosylated polypeptides from Sf9 insect cells is substantially higher for a AcyP2* variant having a Phe residue in the tight turn relative to a variant that lacks the Phe residue.

FIG. 4 in five parts illustrates in FIG. 4A the residues of loop 1 of the 34-residue WW domain from human Pin 1 (Pin WW or Pin1 WW), a glycosylation-naïve β-sheet protein, that contains a four residue type II β-turn within a larger six-residue H-bonded loop. WW is SEQ ID NO:204, g-WW is SEQ ID NO:205, WW-F is SEQ ID NO:197, g-WW-F is SEQ ID NO:198, WW-T is SEQ ID NO:199, g-WW-T is SEQ ID NO:200, WW-F,T is SEQ ID NO:201 and g-WW-F,T is SEQ ID NO:202. FIG. 4B shows melting curves of a glycosylated (g-WW-F,T) and non-glycosylated (WW-F,T) variants; FIGS. 4D and 4E show illustrative plots from variable temperature circular dichroism spectroscopy and laser temperature jump studies; and FIG. 4F tabulates the thermal stability and folding rate data for the eight Pin WW variants studied.

FIG. 5 in three parts illustrates triple mutant cycle cubes formed by protein 4, glycoprotein 4g, and their derivatives (FIG. 5A); Protein 5, glycoprotein 5g, and their derivatives (FIG. 5B); and Protein 6, glycoprotein 6g, and their derivatives (FIG. 5C).

FIG. 6 is a graph showing the origin of the increase in stability of Pin1 protein derivatives 4-F,T, 5-F,T, and 6-F,T upon glycosylation. ΔG_(f,total) is the sum of the energetic effects of (1) the Asn19 to Asn(GlcNAc)19 mutation (C _(N) ); (2) the two-way interaction between Phe16 and Asn(GlcNAc)19 (C_(F,N) ); (3) the two-way interaction between Asn(GlcNAc)19 and Thr21 (C _(N,T)); and (4) the three-way interaction between Phe16, Asn(GlcNAc)19, and Thr21 (C_(F,N,T)). C _(N) , C_(F,N) , C _(N,T), and C_(F,N,T), are parameters obtained from least-squares regression of Equation A; error bars represent the corresponding standard errors.

DEFINITIONS

To facilitate understanding of the invention, a number of terms are defined below.

The term “antibody” refers to a molecule that is a member of a family of glycosylated proteins called immunoglobulins, which can specifically bind to an antigen.

The term “chimer” or “chimeric” is used to describe a polypeptide that is man-made and does not occur in nature. A contemplated chimeric polypeptide is encoded by a nucleotide sequence made by a splicing together of two or more complete or partial genes or cDNA, or by synthetically constructing such a polypeptide by in vitro methods. The pieces used can be from different species. In the present instance, the sequence of the sequon Aro-(Xxx)n -(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:001), as defined before, is typically spliced into a tight turn present in a pre-existing therapeutic polypeptide using genetic engineering techniques.

The term “polypeptide” is used herein to denote a sequence of about 15 to about 1000 peptide-bonded amino acid residues. A whole protein as well as a portion of a protein having the stated minimal length is a polypeptide.

The term “tight turn” is used herein as defined in Chou, Anal Biochem 286, 1-16 (2000) to mean a polypeptide site where (i) a polypeptide chain reverses its overall direction, and (ii) the amino acid residues directly involved in forming the turn are no more than six. Tight turns are generally categorized as δ-turn, γ-turn, β-turn, α-turn, and π-turn, which are formed by two-, three-, four-, five-, and six-amino-acid residues, respectively. According to the folding mode, each of such tight turns can be further classified into several different types. β-Turns also known as “reverse turns” are of most interest herein, and of those tight turns, the tight turns referred to as a type-I β-bulge turn, a type-I′ β-turn and a type-II β-turn are of particular interest. Methods for predicting the presence of β-turns in polypeptides are provided in the citations of Chou, Anal Biochem 286, 1-16 (2000), and are otherwise well known in the art.

All amino acid residues identified herein are in the natural L-configuration. In keeping with standard polypeptide nomenclature, J. Biol. Chem., 243, 3557-3559 (1969), abbreviations for amino acid residues are as shown in the following Table of Correspondence:

TABLE OF CORRESPONDENCE SYMBOL 1-Letter 3-Letter AMINO ACID Y Try L-tyrosine G Gly glycine F Phe L-phenylalanine M Met L-methionine A Ala L-alanine S Ser L-serine I Ile L-isoleucine L Leu L-leucine T Thr L-threonine V Val L-valine P Pro L-proline K Lys L-lysine H His L-histidine Q Gln L-glutamine E Glu L-glutamic acid W Trp L-tryptophan R Arg L-arginine D Asp L-aspartic acid N Asn L-asparagine C Cys L-cysteine

DETAILED DESCRIPTION OF THE INVENTION

The present invention contemplates a therapeutic chimeric polypeptide that is typically present in isolated and purified form, and is a chimer of a pre-existing therapeutic polypeptide. The pre-existing therapeutic polypeptide has a length of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300 amino acid residues.

A pre-existing therapeutic polypeptide is a polypeptide used as a pharmaceutical or nutraceutical that is administered to a human or other animal. A contemplated pre-existing therapeutic polypeptide is typically prepared exogenously of the recipient's body, but can be an endogenous polypeptide. A contemplated chimeric therapeutic polypeptide is typically prepared as an exogenous polypeptide, but can be produced endogenously via gene therapy.

A contemplated pre-existing therapeutic polypeptide exhibits a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7 Å of each other. The four to about seven amino acid residues present do not necessarily participate in the formation of the tight turn, but are present in the turn.

A contemplated chimeric therapeutic polypeptide has substantially the same length, at least one tight turn and substantially the same amino acid residue sequence as the pre-existing therapeutic polypeptide. However, a contemplated chimer is different in its total amino acid sequence from the pre-existing polypeptide, and can be longer or shorter by one to about three residues than the pre-existing therapeutic polypeptide (substantially the same length), but is preferably the same length. The two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, in the direction from left to right and from N-terminus to C-terminus,

Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser,  (SEQ ID NO:001)

wherein Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan, of which phenylalanine, tyrosine and tryptophan are preferred,

n is zero, 1, 2, 3 or 4,

Xxx is an amino acid residue other than an aromatic residue,

p is zero or one,

Zzz is any amino acid residue,

Asn is asparagine,

Yyy is any amino acid residue other than proline, and

Thr/Ser is one or the other of the amino acid residues threonine and serine, of which threonine is preferred.

The above sequon is located at the same position in the tight turn as the sequence of four to about seven amino acid residues present in the pre-existing polypeptide such that the side chains of three amino acid residues—Aro, Asn and Thr/Ser—project on the same side of the turn and are within less than about 7 Å of each other.

The sequence of four to about seven amino acid residues present in the pre-existing polypeptide is preferably glycosylation-free. In preferred practice, the above sequon is glycosylated. When the sequon is glycosylated, the therapeutic chimeric polypeptide exhibits a folding stabilization enhancement by about −0.5 to about −4 kcal/mol compared to the pre-existing therapeutic polypeptide in non-glycosylated form.

Returning to the formula for the above-mentioned sequon, it is seen that the length can be four residues when n=p=zero, five residues when n=zero and p=1 or when n=1 and p=0, and ten residues when n=4 and p=1. Of course, intermediate lengths between four and nine residues are also contemplated. It is additionally preferred that the residues Xxx, Yyy and Zzz be other than cysteine.

The above Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:001) sequence is referred to herein as an “enhanced aromatic sequon” because of its increased propensity to form a stabilizing compact structure upon glycosylation, relative to the canonical Asn-Yyy-Thr sequon, and because it is more efficiently glycosylated by eukaryotic cells than the Asn-Yyy-Thr sequon [Culyba et al., Science 331, 571-575 (2011)].

The sequon Asn-Yyy-Thr/Ser is a sequon recognized by the enzyme oligosaccharyl transferase (OST). OST attaches the highly conserved Glc₃Man₉GlcNAc₂ glycan en bloc to the N atom of the Asn amido side chain. A glycan bonded to the amido nitrogen of an asparagine side chain is illustrated herein as “Asn(Glycan)” to denote any glycan.

During the translocation of a glycosylated polypeptide through the endoplasmic reticulum (ER), several sugars including each glucose (Glc) and several of the mannose (Man) groups are removed from the glycan portion. The specific resulting glycan is dependent upon the plant or animal in which the polypeptide is expressed, and at what stage after expression the glycopolypeptide is recovered.

Illustrative glycosylated Asn residues include those with one N-acetylglucosamine [Asn (GlcNAc)], two N-acetylglucosamines [Asn(GlcNAc2)], with one mannose and two N-acetylglucosamines [Asn(ManGlcNAc2)], and with three mannoses and two N-acetylglucosamines that is referred to as “paucimannose” (Man3GlcNAc2) that forms the glycosylated residue Asn(Man3GlcNAc2), and the like. Additionally, glycosylated asparagine residues can be utilized in an in vitro polypeptide synthetic scheme.

In another embodiment, the sequon contemplated has the formula, from left to right and in the direction from N-terminus to C-terminus,

-Lys-(Zzz)_(m)-Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser,  (SEQ ID NO:002)

wherein

m is zero, 1, 2, or 3, and

Lys is lysine, and

Zzz, Aro, Xxx, n, p, Yyy and Thr/Ser are as defined previously.

Again, as in the previous discussion, this sequon is positioned in the tight turn sequence of the chimeric polypeptide at the same position in the tight turn as the sequence of four to about seven amino acid residues present in the pre-existing polypeptide such that the side chains of four amino acid residues—Lys, Aro, Asn and Thr/Ser—project on the same side of the turn and are within less than about 7 Å of each other. That is, each of the Lys, Aro and Thr/Ser residue side chains interacts with the glycan of the Asn residue after proper folding, as for example, after expression and passage of the expressed polypeptide through the ER.

Another way to identify the position of the about four to seven residue amino acid residues present in the pre-existing polypeptide is through use of the numbering system utilized for the location of residues present in a hydrogen bonded sequence of a β-turn, even though a hydrogen bond need not be present in a contemplated tight turn. In this system, the N-terminal residue of the sequence that participates in the hydrogen bond is designated the “i” residue. Going in the direction toward the C-terminus of the sequence, the residues are numbered “i+1”, “i+2”, “i+3”, “i+4”, “i+5”, etc. Residues to the N-terminal side of residue “i” are numbered “i−2”, “i−3”, “i−4”, “i−5”, etc.

Illustrative examples of this type of nomenclature can be seen hereinafter such as in work regarding the type-I β-bulge turn present in the non-therapeutic genetically-engineered polypeptide rat glycoprotein CD2 (RnCD2*). The sequon in that type-I n-bulge turn was engineered to be Asn-Gly-Thr, within the seven residue sequence Glu-Ile-Leu-Ala-Asn-Gly-Thr (SEQ ID NO:006) and was replaced in the chimeric polypeptide by the sequon -Lys-(Zzz)_(m)-Aro-(Xxx)n -(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:004), where m=1, n=3 and p=zero, Lys-Ile-Phe-Ala-Asn-Gly-Thr (SEQ ID NO:007). The pre-existing sequence in the pre-existing RnCD2* is Asn-Gly-Thr, where the Asn is at the i position, whereas the Gly is at the i+1, and Thr is at the i+2 position. In the chimeric polypeptide, the Asn, Gly and Thr are as before, and the Lys, Ile, Phe, and Ala are at positions i−4, i−3, i−2, and i−1, respectively.

In the above polypeptide Lys-Ile-Phe-Ala-Asn-Gly-Thr (SEQ ID NO:007)} the Phe, Thr and Asn(Glycan) interact, and the Lys also appears to interact with those residues. As a result, looking from the viewpoint of the chimeric therapeutic polypeptide, one can base the numbering nomenclature upon the Phe as the i residue, the Ala as i+1, the Asn as i+2, the Gly as i+3, and the Thr as i+4. Both methods of numbering are used herein.

Turning again to the before-discussed preferred sequon,

Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser  (SEQ ID NO:001),

in one preferred embodiment, “n” is 1 and “p” is 1 and the chimeric polypeptide contains a Type II β-turn in a six-residue loop. The resulting enhanced aromatic sequon present in the chimeric polypeptide has the sequence:Aro-Xxx-Zzz-Asn-Yyy-Thr/Ser (SEQ ID NO:008).

In another preferred embodiment, “n” is 1 and “p” is zero. The pre-existing and chimeric polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Xxx-Asn-Yyy-Thr/Ser (SEQ ID NO:002) as defined above. The chimeric polypeptide preferably contains a five-residue type I β-bulge turn.

In still another preferred embodiment, “n” is zero and “p” is zero. The pre-existing and chimeric polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Asn-Yyy-Thr/Ser (SEQ ID NO:003) as defined above. Here, a preferred chimeric polypeptide contains a four-residue type I′ β-turn.

One group of exemplary pre-existing therapeutic polypeptides is constituted of therapeutic antibodies, and particularly the heavy chains of human antibodies. The heavy chain of all IgG-type antibodies has three constant domains: CH1, CH2, and CH3. The CH2 and CH3 domains form what is called the Fc fragment, or the crystallizable fragment. A complete human antibody heavy chain contains about 450 amino acid residues, of which about one-half are present in the Fc portion.

The following Table A provides a list of USAN names of therapeutic antibodies that are approved or at some point in clinical trials. The CH2 and CH3 domains of human antibody Fc portions contain reverse turns, each of which can be modified to form one or two enhanced aromatic sequons. The pre-existing tight turn sequence of illustrative antibodies or antibody Fc portions such as those below and exemplary replacement sequons contemplated herein that can provide enhanced folding stability are provided in Table B thereafter.

TABLE A USAN Names of Therapeutic Antibodies “abagovomab”, “adalimumab”, “alemtuzumab”, “apolizumab”, “basiliximab”, “basliximab”, “belimumab”, “bevacizumab”, “canakinumab”, “catumaxomab”, “certolizumab”, “cetuximab”, “cixutumumab”, “conatumumab”, “consumab”, “daclizumab”, “dalotuzumab”, “denosumab”, “dermab”, “eculizumab”, “edrecolomab”, “efalizumab”, “efungumab”, “elotuzumab”, “epratuzumab”, “ertumaxomab”, “etaracizumab”, “figitumumab”, “galiximab”, “ganitumab”, “gemtuzumab”, “genmab”, “golimumab”, “ibalizumab”, “ibritumomab”, “infliximab”, “ipilimumab”, “lexatumumab”, “lintuzumab”, “lumiliximab”, “mapatumumab”, “matuzumab”, “mepolizumab”, “milatuzumab”, “motavizumab”, “natalizumab”, “necitumumab”, “nimotuzumab”, “ofatumumab”, “omalizumab”, “oregovomab”, “otelixizumab”, “palivizumab”, “panitumumab”, “pertuzumab”, “ramucirumab”, “ranibizumab”, “reslizumab”, “rituximab”, “siplizumab”, “sonepcizumab”, “tanezumab”, “tefibazumab”, “teplizumab”, “ticilimumab”, “tocilizumab”, “tositumomab”, “trastuxumab”, “trastuzumab”, “tremelimumab”, “tucotuzumab”, “ustekinumab”, “veltuzumab”, “visilizumab”, “volociximab”, “zalutumumab”

TABLE B Native Exemplary Drug Sequence Sequon Bank ID Polypeptide [SEQ ID [SEQ ID Number USAN Name Identity NO] NO] DB00078 Ibritumomab Mouse Anti- DYNST FYNST CD20 Heavy [009] [010] chain 1 ″ ″ Mouse Anti- DSDGS YSNGS CD20 Heavy [011] [012] chain 1 DB00078 Ibritumomab Mouse Anti- DYNST FYNSS CD20 Heavy [013] [014] chain 2 ″ ″ Mouse Anti- DSDGS YSNGT CD20 Heavy [015] [016] chain 2 DB00028 Immune IgG1 QYNST YYNST globulin [017] [018] ″ Immune ″ DSDGS YSNGS globulin [019] [020] DB00005 Etanercept TNF receptor QYNST WYNST 2 fused to [021] [022] human Fc of IgG1 ″ ″ TNF receptor DSDGS HSNGT 2 fused to [023] [024] human Fc of IgG1 DB00087 Alemtuzumab 1CE1:H DSDGS WSNGT CAMPATH- [025] [026] 1H:Heavy Chain 1 ″ ″ 1CE1:H DSDGS FSNGT CAMPATH- [027] [028] 1H:Heavy Chain 2 DB00113 Arcitumomab 1clo:Anti- DYNST FYNST CEA heavy [029] [030] chain 1 ″ ″ 1clo:Anti- DSDGS FSNGT CEA heavy [031] [032] chain 1 DB00113 Arcitumomab 1clo:Anti- DYNST FYNST CEA heavy [033] [034] chain 2 ″ ″ 1clo:Anti- DSDGS WSNGS CEA heavy [035] [036] chain 2 DB00043A Omalizumab IgE QYNST WYNST nti antibody VH [037] [038] domain chain 1 DB00043A ″ IgE DSDGS YSNGT nti antibody VH [039] [040] domain chain 1 DB00043A Omalizumab IgE QYNST HYNST nti antibody VH [041] [042] domain chain 2 DB00043A ″ IgE DSDGS YSNGS nti antibody VH [043] [044] domain chain 2 DB00057 Satumomab Heavy chain DYNST FYNST Pendetide 1 B72.3 [045] [046] ″ Satumomab Heavy chain DSDGS HSNGT Pendetide 1 B72.3 [047] [048] DB00057 Satumomab Heavy chain DYNST WYNST Pendetide 2 B72.3 [049] [050] ″ Satumomab Heavy chain DSDGS YSNGT Pendetide 2 B72.3 [051] [052] DB00092 Alefacept Human LFA QYNST WYNST fused to [053] [054] human Fc ″ ″ Human LFA DSDGS YSNGT fused to [055] [056] human Fc DB00111 Daclizumab Humanized QYNST FYNST Anti-CD25 [057] [058] Heavy Chain 1 ″ ″ Humanized DSDGS YSNGS Anti-CD25 [059] [060] Heavy Chain 1 DB00111 Daclizumab Humanized QYNST HYNST Anti-CD25 [061] [062] Heavy Chain 2 ″ ″ Humanized DSDGS FSNGS Anti-CD25 [063] [064] Heavy Chain 2 DB00002 Cetuximab Anti-EGFR DSDGS WSNGS heavy chain [065] [066] 1 DB00002 Cetuximab Anti-EGFR DSDGS WSNGT heavy chain [067] [068] 2 DB00081 Tositumomab Mouse-Human QYNST FYNST chimeric [069] [070] Anti-CD20 heavy chain 1 ″ ″ Mouse-Human DSDGS FSNGT chimeric [071] [072] Anti-CD20 heavy chain 1 DB00081 Tositumomab Mouse-Human QYNST FYNSS chimeric [073] [074] Anti-CD20 heavy chain 2 ″ ″ Mouse-Human DSDGS WSNGT chimeric [075] [076] Anti-CD20 heavy chain 2 DB00072 Trastuzumab Anti-HER2 QYNST FYNST Heavy chain [077] [078] 1 ″ ″ Anti-HER2 DSDGS WSNGS Heavy chain [079] [080] 1 DB00072 Trastuzumab Anti-HER2 QYNST FYNST Heavy chain [081] [082] 2 ″ ″ Anti-HER2 DSDGS WSNGS Heavy chain [083] [084] 2 DB00075 Muromonab 1SY6:H OKT3 QYNST WYNST Heavy Chain [085] [086] 1 ″ ″ 1SY6:H OKT3 DSDGS FSNGS Heavy Chain [087] [088] 1 DB00075 Muromonab 1SY6:H OKT3 QYNST HYNST Heavy Chain [089] [090] 2 ″ ″ 1SY6:H OKT3 DSDGS YSNGS Heavy Chain [091] [092] 2 DB00054 Abciximab 1TXV:H QYNST WYNST ReoPro-like [093] [094] antibody Heavy Chain 1 ″ ″ 1TXV:H DSDGS WSNGT ReoPro-like [095] [096] antibody heavy Chain 1 DB00054 Abciximab 1TXV:H QYNST FYNST ReoPro-like [097] [098] antibody Heavy Chain 2 ″ ″ 1TXV:H DSDGS FSNGT ReoPro-like [099] [100] antibody Heavy Chain 2 DB00074 Basiliximab 1MIM:H QYNST YYNST Anti-CD25 [101] [102] antibody heavy CHIMERIC chain 1 ″ ″ 1MIM:H DSDGS FSNGT Anti-CD25 [103] [104] antibody heavy CHIMERIC chain 1 DB00074 Basiliximab 1MIM:H QYNST WYNST Anti-CD25 [105] [106] antibody heavy CHIMERIC chain 2 ″ ″ 1MIM:H DSDGS FSNGS Anti-CD25 [107] [108] antibody heavy CHIMERIC chain 2 DB00073 Rituximab Mouse-Human QYNST HYNSS chimeric [109] [110] Anti-CD20 Heavy Chain 1 ″ ″ Mouse-Human DSDGS YSNGS chimeric [111] [112] Anti-CD20 Heavy Chain 1 DB00073 Rituximab Mouse-Human QYNST FYNSS chimeric [113] [114] Anti-CD20 Heavy Chain 2 ″ ″ Mouse-Human DSDGS WSNGS chimeric [115] [116] Anti-CD20 Heavy Chain 2

Another group of exemplary pre-existing therapeutic polypeptides is hormones. Illustrative of such hormones are erythropoietin, darbepoetin alfa (an erythropoietin variant with two additional N-glycans), interferon beta, and follicle stimulating hormone, follitropin beta, peginterferon alfa-2b, becaplermin, sermorelin, somatropin, pramlintide, sargramostim, insulin, thyrotropin alfa, choriogonadotropin alfa, lepirudin, lutropin alfa, secretin, bivalirudin, corticotrophin, exenatide and the like.

Yet another group of exemplary pre-existing therapeutic polypeptides is enzymes. Illustrative of such enzymes are laronidase, collagenase, pancrelipase, streptokinase, urokinase, imiglucerase, reteplase, coagulation factor VII, coagulation factor VII, coagulation factor IX, alglucerase, agalsidase beta, asparaginase, hyaluronidase, tenecteplase, pegademase bovine, dornase alfa, anistreplase, pegaspargase, alteplase, and the like.

Further pre-existing polypeptides include denileukin diftitox, botulinum toxin type B, nesiritide, pegfilgrastim, human serum albumin, mecasermin, aldesleukin, antihemophilic factor, aprotinin, palifermin, peginterferon alfa-2a, teriparatide, urofollitropin, anakinra, menotropins, OspA lipoprotein, pegvisomant, thymalfasin, follitropin beta, peginterferon alfa-2b, alpha-1-proteinase inhibitor, filgrastim, oprelvekin, rasburicase, darbepoetin alfa, enfuvirtide and the like.

Table C, below, illustrates five residue native sequences within tight turns of two of the above polypeptides, the alpha chain of follitropin beta, which has a type VI β-turn, and imiglucerase, which has a type I β-bulge turn. Also illustrated for each of those polypeptides are replacement sequon sequences for the illustrated native five residue sequences.

TABLE C Drug Native Exemplary Bank Sequence Sequon ID Polypeptide [SEQ ID [SEQ ID Number USAN Name Identity NO] NO] DB00066 Follitropin Human VMGGF FMNGT beta follicle [117] [118] alpha chain stimulating hormone ″ Follitropin Human VMGGF WMNGT beta follicle [117] [119] alpha chain stimulating hormone ″ Follitropin Human VMGGF YMNGT beta follicle [117] [120] alpha chain stimulating hormone ″ Follitropin Human VMGGF HMNGT beta follicle [117] [121] alpha chain stimulating hormone ″ Follitropin Human VMGGF FMNGS beta follicle [117] [122] alpha chain stimulating hormone ″ Follitropin Human VMGGF WMNGS beta follicle [117] [123] alpha chain stimulating hormone ″ Follitropin Human VMGGF YMNGS beta follicle [117] [124] alpha chain stimulating hormone ″ Follitropin Human VMGGF HMNGS beta follicle [117] [125] alpha chain stimulating hormone DB00053 Imiglu- Human Beta- HPDG FPNGT cerase glucocidase [126] [127] ″ ″ Human Beta- HPDGS WPNGT glucocidase [126] [128] ″ ″ Human Beta- HPDGS YPNGT glucocidase [126] [129] ″ ″ Human Beta- HPDGS HPNGT glucocidase [126] [130] ″ ″ Human Beta- HPDGS FPNGS glucocidase [126] [131] ″ ″ Human Beta- HPDGS WPNGS glucocidase [126] [132] ″ ″ Human Beta- HPDGS YPNGS glucocidase [126] [133] ″ ″ Human Beta- HPDGS HPNGS glucocidase [126] [134]

Nearly 9% of the reverse turns in the Protein Data Bank (PDB) are type I β-bulge turns [Sibanda et al., J Mol Biol 206(4), 759-777 (1989); and Oliva et al., J Mol Biol 266(4), 814-830 (1997)], so installing the Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser (SEQ ID NO:001) enhanced aromatic sequon could be an attractive strategy for increasing the stability of the many proteins that harbor type I β-bulge turns. Identifying other suitable reverse turn types that could position Phe, GlcNAc1, and Thr close enough to facilitate a tripartite interaction would further expand the number of proteins that could benefit from the increased stability and possibly the increased glycosylation efficiency afforded by the enhanced aromatic sequon.

Illustrative glycosylated Asn residues include those with one N-acetylglucosamine [Asn(GlcNAc)], two N-acetylglucosamines [Asn(GlcNAc2)], with one mannose and two N-acetylglucosamines [Asn(ManGlcNAc2)], and with three mannoses and two N-acetylglucosamines that is referred to as “paucimannose” (Man3GlcNAc2) that forms the glycosylated residue Asn(Man3GlcNAc2), and the like. Additionally, glycosylated asparagine residues can be utilized in an in vitro polypeptide synthetic scheme.

Preparation of a Chimeric Therapeutic Polypeptide

A method of method of enhancing folded stabilization of a chimeric therapeutic polypeptide compared to a pre-existing therapeutic polypeptide is also contemplated. The pre-existing therapeutic polypeptide comprises a sequence of about 15 to about 1000 amino acid residues, preferably about 25 to about 500 residues, and more preferably about 35 to about 300 residues, and exhibits a secondary structure that comprises at least one tight turn in which the side chains of two residues in a sequence of four to about seven amino acid residues within the tight turn project on the same side of the turn and are within less than about 7 Å of each other. Those four to about seven amino acid residues are preferably glycosylation-free. In accordance with that method, a therapeutic chimeric polypeptide is prepared that is of the same length and substantially same sequence as the therapeutic polypeptide and exhibits a secondary structure comprising at least one tight turn at the same sequence position within the tight turn as in the therapeutic polypeptide, except that said sequence of four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus,

Aro-(Xxx)n-(Zzz)p-Asn(Glycan)-Yyy-Thr/Ser,  (SEQ ID NO:005)

wherein

Aro is an aromatic amino acid residue,

n is zero, 1, 2, 3 or 4,

Xxx is an amino acid residue other than an aromatic residue,

p is zero or 1,

Zzz is any amino acid residue,

Asn(Glycan) is glycosylated asparagine,

Yyy is any amino acid residue other than proline,

Thr/Ser is one or the other of the amino acid residues threonine and serine, and

the side chains of the Aro, Asn(Glycan) and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7 Å of each other.

In some embodiments, the Asn(Glycan) is Asn(GlcNAc)1. In other embodiments, Asn(Glycan) is Asn(GlcNAc)2, whereas in other embodiments Asn(Glycan) is Asn(GlcNAc)₂Man₁. In still other embodiments, the glycan of Asn(Glycan) is paucimannose.

A contemplated polypeptide can be prepared in a number of manners. Longer polypeptides, such as those of about 50 residues and longer, are most readily prepared by genetic engineering following well known techniques. Thus, for example, a therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of the therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid sequence Aro-(Xxx)n-(Zzz)pAsn-Yyy-Thr/Ser (SEQ ID NO:003) when present in a polypeptide sequence expressed therein to form the sequence Aro-(Xxx)n-(Zzz)p-Asn(Glycan)-Yyy-Thr/Ser (SEQ ID NO:005). Examples of such preparations are illustrated hereinafter.

In such a preparation, any of eukaryotic several host cells can be utilized for the preparation of a glycosylated chimeric therapeutic polypeptide. For example, yeast cells such as Saccharomyces cerevisiae, Pichia pastoris, mammalian cells such as CHO cells, insect cells such as Spodoptera frugiperda (Sf9) cells, and in plant cells such as those of tobacco (Nicotiana tobaccum M38) or Arabidopsis thaliana. Unstablized (unglycosylated or non-glycosylated) therapeutic polypeptides useful for comparative purposes can be expressed in bacterial cells that do not gylcosylate their expressed polypeptides such as E. coli.

In the following examples, an illustrative polypeptide is expressed as a fusion protein that contains isolation and purification sequences. One such sequence is a 6-residue hexa-histidine sequence at the N-terminus of the polypeptide to assist in purifying and isolating the desired chimer via binding to a Nickel affinity ligand on a solid support. Additional affinity tags include the 9-residue FLAG-tag and the myc-tag that are bound by solid-support-linked antibody binding sites. The so-called Strep-tag® II, which consists of a streptavidin-recognizing octapeptide, can be affinity-purified using a matrix with a modified streptavidin and eluted with a biotin analog.

Because it is desirable to remove most tags at the end of the purification process, considerable advances have been made in design of affinity tags so that they can be cleaved without leaving any residues behind and also to simplify the entire process of purification and cleavage. One such system is the “Profinity eXact™” fusion-tag system (Bio-Rad Laboratories, Hercules, Calif.), which uses an immobilized subtilisin protease to carry out affinity binding and tag cleavage. The protease is not only involved with the binding and recognition of the tag, but upon application of the elution buffer, it also serves to precisely cleave the tag from the fusion protein directly after the cleavage recognition sequence. This delivers a native, tag-free polypeptide in a single step. Another system for simple purification of proteins is based on elastin-like polypeptides (ELP) and intein. ELP consist of several repeats of a peptide motif that undergo a reversible transition from soluble to insoluble upon temperature upshift. The fusion protein is purified by temperature-induced aggregation and separation by centrifugation, and intein is used for tag removal. No affinity columns are needed for initial purification.

Solubility-enhancing tags are generally large peptides or proteins that increase the expression and solubility of fusion proteins. Fusion tags like GST and MBP also act as affinity tags and as a result, they are very popular for protein purification. Other fusion tags like NusA, thioredoxin (TRX), small ubiquitin-like modifier (SUMO), and ubiquitin (Ub), on the other hand, require additional affinity tags for use in protein purification.

An expressed polypeptide also preferably includes a peptide cleavage site so that a purified polypeptide can be cleaved from any tags utilized in its purification and isolation. This cleavage or tag-removal step almost always involves using a protease to cleave a specific peptide bond between the tag and the protein of interest. A small number of highly specific proteases are routinely used for this purpose. These include the tobacco etch virus (TEV) protease; thrombin (factor IIa, fIIa) and factor Xa (fXa) from the blood coagulation cascade; an enzyme involved in the cleavage or activation of trypsin in the mammalian intestinal tract, enterokinase (EK); proteases involved in the maturation and deconjugation of SUMO, SUMO proteases (Ulp1, Senp2, and SUMOstar); and a mutated form of the Bacillus subtilis protease, subtilisin BPN′ (Bio-Rad's Profinity eXact system). Many of these enzymes have been genetically engineered to enhance their stability (e.g., AcTEV™, ProTEV) or their specificity, (e.g. SUMOstar, Profinity). With the exception of the SUMO proteases, all of these enzymes have the potential to cleave within the protein of interest. The SUMO proteases recognize not only their specific cleavage site, but also the tertiary structure of SUMO itself, giving them a very high degree of specificity.

A desired polypeptide can also be prepared by one or more of the well known in vitro polypeptide synthesis techniques, particularly solid phase synthesis. This mode of synthesis is also illustrated hereinafter.

Pharmaceutical Compositions

In yet another embodiment of the invention, a contemplated chimeric therapeutic polypeptide is an active ingredient in a pharmaceutical composition for administration to a human patient or suitable animal host such as a chimpanzee, mouse, rat, horse, sheep or the like.

Thus, a contemplated chimeric therapeutic polypeptide is dissolved or dispersed in a pharmaceutically acceptable diluent composition that typically also contains water. When administered to a host animal in need of the polypeptide, such as a mammal (e.g., a mouse, dog, goat, sheep, horse, bovine, monkey, ape, or human) or bird (e.g., a chicken, turkey, duck or goose), the polypeptide provides the benefit of the pre-existing polypeptide.

The amount of chimeric therapeutic polypeptide present in a pharmaceutical composition is referred to as an effective amount and can vary widely, depending inter alia, upon the polypeptide used and the presence of adjuvants and/or other excipients present in the composition. The amount of chimeric therapeutic polypeptide that constitutes an effective amount varies with the polypeptide and the condition to be treated. Starting dosages are taken from the literature or the product label of the corresponding pre-existing therapeutic polypeptide usage, and are typically ultimately some what less than that used for the pre-existing therapeutic polypeptide.

The preparation of pharmaceutical compositions that contain proteinaceous materials as active ingredients is well understood in the art. Typically, such compositions are prepared as parenterals, either as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, liquid prior to injection can also be prepared. The preparation can also be emulsified.

Once purified, a contemplated chimeric therapeutic polypeptide is typically recovered by lyophilization. A pharmaceutical composition is typically prepared from a recovered chimeric therapeutic polypeptide by dispersing the polypeptide, preferably in particulate form, in a physiologically tolerable (acceptable) diluent vehicle such as water, saline, phosphate-buffered saline (PBS), acetate-buffered saline (ABS), Ringer's solution, or the like to form an aqueous composition. Alternatively, the lyophilized polypeptide is mixed with additional solid excipients and stored as such for constitution with water, saline and the like as discussed above.

Excipients that are pharmaceutically acceptable and compatible with the active ingredient are often mixed with the solid polypeptide, or can be predissolved in the liquid medium. Suitable excipients are, for example, water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, if desired, a composition can contain minor amounts of auxiliary substances such as wetting or emulsifying agents, pH buffering agents that enhance the effectiveness of the composition.

ILLUSTRATIVE EXAMPLES

The adhesion domain of human glycoprotein CD2 (HsCD2ad), a non-therapeutic polypeptide, is glycosylated at Asn65, within the Asn65-Gly66-Thr67 sequon (FIG. 1A). NMR and crystallographic data demonstrate that Asn65 occupies the i+2 position of a five-residue type I β-bulge turn that spans from Phe63 (i) to Thr67 (i+4; FIG. 1B), with Gly66 occupying the i+3 bulge position [Wyss et al., Science 269, 1273-1278 (1995); Wang et al., Cell 97, 791-803 (1999)]. Nuclear Overhauser effects (NOEs) suggest that the side chain of Phe63 at the i position interacts with the hydrophobic face of the first GlcNAc residue of the glycan (Asn65-GlcNAc1-GlcNAc2-), which also packs into the side chain methyl group of Thr67 (see FIG. 1C for a space-filling view of this cluster).

NOE evidence also suggests the possibility of a stabilizing protein-glycan interaction between GlcNAc2 of the glycan and Lys61 [Wyss et al., Science 269, 1273-1278 (1995)]. Wyss et al. hypothesized that this interaction disperses the positive charge present in a cluster of five Lys residues, but the energetics of this interaction were not probed [Wyss et al., Science 269, 1273-1278 (1995)]. Previous kinetic studies of glycan-dependent HsCD2ad folding suggest that the N-glycan does much more than attenuate unfavorable electrostatic interactions [Hanson et al., Proc Natl Acad Sci U S A 106:3131-3136 (2009)].

Bioinformatic analysis of the protein data bank (PDB) has revealed that aromatic residues are overrepresented two residues before Asn in occupied sequons [Petrescu et al., Glycobiology 14, 103-114 (Februaru, 2004)], leading us to hypothesize that the unusually large stabilizing effect of glycosylation on HsCD2ad folding is largely due to a tripartite Phe63-GlcNAc1-Thr67 interaction. Because nonglycosylated HsCD2ad is unfolded, we used the structurally homologous rat ortholog of HsCD2ad (RnCD2ad) to test this hypothesis as RnCD2ad does not require N-glycosylation to fold. Residues 63-67 of the RnCD2ad retain the same five-residue type I β-bulge turn geometry found in HsCD2ad (FIG. 2A, inset) [Jones et al., Nature 360, 232-239 (1992)].

We installed the Asn65-Gly66-Thr67 glycosylation sequon from HsCD2ad into the β-bulge turn of RnCD2 by mutating Asp67 to Thr (residues 65 and 66 in the wild-type RnCD2ad sequence are already Asn and Gly, respectively). To generate a version of RnCD2ad that would be glycosylated only within this turn context, we removed three naturally occurring N-glycosylation sequons (by mutating Asn72, Asn82 and Asn89 to Gln, Gln and Asp, respectively). This modified RnCD2ad sequence (which contains only one glycosylation site at Asn65) is referred to as RnCD2*.

RnCD2* folds in the absence of glycosylation (expressed in E. coli), and has Glu at position 61 and Leu at position 63 in contrast to the Lys61 and Phe63 in HsCD2ad (FIG. 2A). These differences make RnCD2* an ideal sequence in which to study the kinetic and thermodynamic consequences of the interactions between the N-glycan, Lys61, and Phe63, using a triple mutant thermodynamic cycle [Jones et al., Nature 360, 232-239 (1992)]. The stabilities and folding kinetics of the eight RnCD2* sequences required for the cycle were determined by equilibrium denaturation and stopped-flow kinetic studies (see FIGS. 2B,C for representative data) and tabulated data are shown in FIG. 2E (N refers to N-glycosylated Asn). Glycosylated (g-RnCD2*) variants appended with Man₆₋₈ oligomannose glycans (determined by ESI-MS; Fig SX) were expressed in Sf9 insect cells.

Glycosylation stabilizes g-RnCD2* by −0.6 kcal mol⁻¹ relative to RnCD2*, which is −2.5 kcal mol⁻¹ less than the increase in stability observed upon glycosylation of HsCD2ad. g-RnCD2*-K (Glu61Lys), and g-RnCD2*-F (Leu63Phe) are stabilized by −1.5 and −1.8 kcal mol⁻¹ relative to the corresponding non-glycosylated variants, respectively. These effects are each about −1 kcal mol⁻¹ greater than the observed increase in stability upon glycosylation of the unmodified RnCD2*, suggesting that Lys61 and Phe63 in these RnCD2* variants are each able to form stabilizing interactions with the N-glycan at position 65 that are putatively similar to the interactions observed in the NMR structure of HsCD2ad.

The N-glycan-dependent contributions of Lys61 (ΔΔG_(f)[RnCD2*-K]-ΔΔG_(f)[RnCD2*]) and Phe63 (ΔΔG_(f)[RnCD2*-F]-ΔΔG_(f)[RnCD2*]) to RnCD2* stability. are comparable: −0.9 kcal mol⁻¹ and −1.2 kcal mol⁻¹, respectively. Notably, these interactions are synergistic: according to the data in the triple mutant cycle, this synergy amounts to −1.0 kcal mol⁻¹. A comparison of kinetic measurements shows that glycosylated variants that contain Phe63 unfold 20 to 200 times more slowly than the corresponding nonglycosylated variants, suggesting that an interaction between Phe63 and the N-glycan at position 65 stabilizes the native state of RnCD2* (FIG. 2E).

Unlike the interaction of the N-glycan with Lys61, which may depend on the presence of a nearby cluster of positively charged residues, the stabilizing tripartite interaction between Phe-i, Asn-N-glycan-i+2 and Thr-i+4 in RnCD2* and HsCD2ad appears to be a self-contained structural module, which we call an enhanced aromatic sequon. We next explored whether incorporating this “enhanced aromatic sequon” into reverse turns in other glycosylation-naïve proteins would also result in substantial increases to stability.

A PDB search supports this possibility by revealing four additional proteins that contain type I β-bulge turns with a Phe at the i position, a glycosylated Asn residue at the i+2 position, and a Thr at the i+4. In each case, the Phe and Thr side chains contact the first GlcNAc of the N-glycan (FIG. 2F). Furthermore, we identified glycosylated type I β-bulge turns in four additional proteins in which aromatic residues other than Phe (Tyr, Trp, or His) occupy the i position, making analogous contacts. This observation highlights the view that aromatic amino acid side chains other than Phe can also enhance glycosylation sequons by engaging in similarly stabilizing interactions with N-glycans in reverse turns.

As is disclosed in detail hereinafter, we demonstrate that placing a Phe two or three residues prior to (up stream of or toward the amino-terminus from) a glycosylated Asn in certain reverse turn-contexts leads to substantial stabilization in three different proteins, and constitutes a portable method for increasing glycoprotein stability.

The portability of the stabilization conferred by the enhanced aromatic sequon was tested by integrating it into a glycosylation-naïve reverse turn in human muscle acylphosphatase (AcyP2), a two-layer α/β protein, in which two α-helices pack against a four-stranded β-sheet [Pastore et al., J Mol Biol 224, 427-440 (1992)]. Reverse turn residues 43 to 47 are not well-enough defined in the NMR structure of AcyP2 to discern their precise conformation, but homologous residues in the crystal structure of common type acylphosphatase (57% identical to AcyP2) adopt a type I β-bulge turn conformation [Yeung et al., Acta Crystallogr Sect F Struct Biol Cryst Commun 62, 80-82 (2006)].

Thus, Thr43Phe (i) and Lys45Asn (i+2) mutations in the 3-bulge turn create the enhanced aromatic sequon (the i+4 position is already Thr; FIG. 3A). The three additional sequons present in wild-type AcyP2 (but which are not normally glycosylated, as AcyP2 is a cytosolic protein) were removed by Ser to Ala mutations at positions 44, 82 and 95 to create a modified variant of AcyP2 (AcyP2*; SEQ ID NO:174) that is N-glycosylated only at Asn45.

Four AcyP2* variants, differing in the identity of the side chain at position 43 (Phe or Thr) and in the presence or absence of a glycan at Asn45 (FIGS. 3A,C) were prepared. The 43T->F mutant is AcyP2*-F (SEQ ID NO:176), the glycosylated variant is g-AcyP2*-F (glycosylated AcyP2*-F; SEQ ID NO:177). Glycoproteins g-AcyP2* (SEQ ID NO:175) and g-AcyP2*-F (SEQ ID NO:177) with predominantly fucosylated paucimannose glycans were expressed in Sf9 insect cells.

The folding free energies of each variant were determined by equilibrium denaturation (see FIG. 3B for representative data). Glycoprotein g-AcyP2*-F is stabilized by −2 kcal mol⁻¹ relative to nonglycosylated AcyP2*-F from E. coli. In contrast, glycoprotein g-AcyP2* is destabilized relative to the non-glycosylated AcyP2* by +0.5 kcal mol⁻¹. Thus, the estimated N-glycan-dependent contribution of the Phe-glycan interaction is −2.5 kcal mol⁻¹, suggesting that an interaction between Phe43 and the N-glycan at position 45 (and putatively Thr47) stabilizes the reverse turn, and thus the protein.

In fact, the contribution of Phe-glycan interaction in AcyP2* is about −1 kcal mol⁻¹ larger than was observed in RnCD2* (FIG. 2E). Even in the absence of structural data for g-AcyP2*-F, it is clear the enhanced aromatic sequon is a portable module that can stabilize proteins, like RnCD2* and AcPy2*, whose glycosylation-naïve reverse turns have not been tailored by evolution for optimal protein-glycan interactions.

In addition to the stabilization, conferred by the enhanced aromatic sequon, cellular glycosylation efficiency was consistently enhanced. The ratio of N-glycosylated to non-glycosylated proteins from Sf9 insect cells is substantially higher for both RnCD2* and AcyP2* variants relative to variants that lack the Phe residue (FIG. 2D and FIG. 3D), suggesting that the enhanced glycosylation sequon may be a better substrate for glycosylation by OST. This observation should prove useful for enhancing glycoprotein yields, as sequon occupancy can be variable. The enzymology of this observation merits further investigation, but it is tempting to speculate that OST may have evolved to favor sequences, like the enhanced aromatic sequon, that stabilize proteins upon glycosylation.

The structural information in the PDB suggests that the origin of the enhanced aromatic sequon effect depends on the Phe^(i)-Xxx-Asn(Glycan)-Gly-Thr^(i+4)-type I β-bulge turn substructure, which allows the Phe, GlcNAc and Thr side-chains to interact optimally.

Several features of this substructure are also likely to be important. The Thr side chain accepts a H-bond from the NH of the first GlcNAc residue and the C═O of the i+2 Asn residue accepts a H-bond from the backbone NH of Thr at the i+4 position. These H-bonds, and the characteristic H-bond between the >C═O of the i+4 Thr and the NH of the i position Phe are largely solvent occluded and may contribute additional enthalpic stabilization to this portable stabilizing substructure.

In principle, reverse turns other than the type I β-bulge turn could also benefit from the tripartite stabilizing interaction between Phe, Thr and Asn-glycan. This hypothesis was tested using a portion of the 34-residue WW domain from human Pin 1 (Pin WW), a glycosylation-naïve β-sheet protein in which three anti-parallel β-strands are connected by two loops. In wild-type WW, loop 1 adopts an unusual six-residue hydrogen-bonded loop harboring an internal type II β-turn (FIG. 1B); 0.1% of the reverse turns in the PDB have this conformation [Oliva et al., J Mol Biol 266(4):814-830 (1997)].

In the Pin1 WW crystal structure, the side chains of Ser16, Ser19, and Arg21 all project on the same side of loop 1, such the side-chain β-carbons (Cβ's) at each position are within 5-6 Å [Ranganathan et al., Cell 89, 875-886 (1997)]. Those distances are close enough to facilitate a stabilizing interaction between Phe, GlcNAc1 and Thr, similar to the interactions observed in the glycosylated type I β-bulge turn of HsCD2ad (FIG. 1A) [Wang et al., Cell 97, 791-803 (1999)]. The similar Cβ-Cβ distances in HsCD2ad suggest that positions 16, 19, and 21 might be suitable locations for incorporating the individual elements of the enhanced sequon (Phe at position 16, Asn-linked glycan at position 19, and Thr at position 21, FIG. 4A). In this version of the enhanced aromatic sequon, Phe is at the −3 position relative to the glycosylated Asn, instead of at the −2 position, as in the examples above.

Pin1 WW can be synthesized chemically, enabling us to examine the contributions of the Thr side chain to N-glycan dependent stabilization of Pin WW, in addition to the Phe-glycan interaction explored above. Here, a simple Asn-GlcNAc side chain is used. Eight Pin WW variants were synthesized (FIG. 4E), which contain all possible combinations of the Ser16Phe, Asn19Asn-GlcNAc, and Arg21Thr mutations, enabling triple mutant thermodynamic cycle analysis. WW (SEQ ID NO:204), g-WW (SEQ ID NO:205), WW-F (SEQ ID NO:197), g-WW-F (SEQ ID NO:198), WW-T (SEQ ID NO:199), g-WW-T (SEQ ID NO:200), WW-F,T, (SEQ ID NO:201) and g-WW-F,T (SEQ ID NO:202). The thermal stability and folding rates of these variants were determined by variable temperature circular dichroism spectroscopy and laser temperature jump studies, respectively (see FIG. 3B-D for representative data and Table D, E for the remaining data) and tabulated data appear in FIG. 4E.

Chemical glycosylation of the Phe-Xxx-Zzz-Asn-Yyy-Thr sequon (SEQ ID NO:135) (with a single GlcNAc, GlcNAc1) in the six-residue loop of WW increased the stability of the resulting WW variant by −0.7 kcal mol⁻¹ [herein and in Culyba et al., Science 331, 571-575 (2011)], a smaller effect than observed for the Phe-Xxx-Asn-Yyy-Thr (SEQ ID NO:136) sequon in the type I β-bulge turns of RnCD2 and AcyP2 (ΔΔG_(f)=−1.8 kcal mol⁻¹, −2.5 kcal mol⁻¹, respectively).

One possible interpretation of these results is that the type II β-turn within the six-residue loop does not promote the stabilizing tripartite interaction between Phe, GlcNAc, and Thr as effectively as does the five-residue type I β-bulge turn. However, key host context differences between the WW, RnCD2, and AcyP2 proteins could also be partially responsible for these observations, including differences in folding topology and mechanism [Nickson et al., Methods 52(1), 38-50 (2010)], and differences in the amino acids that flank the glycosylated reverse turns [Culyba et al., Science 331, 571-575 (2011)].

Moreover, because the WW domain is synthesized chemically via a solid-phase strategy, the N-glycan in WW (GlcNAc) is much smaller than the N-glycans in RnCD2 (oligomannose) and AcyP2 (fucosylated paucimannose). Interactions between the host sequences and these extended glycans could also contribute to the stabilization associated with glycosylating the Phe-Xxx-Asn-Yyy-Thr (SEQ ID NO:136) sequon in the type I β-bulge turns within RnCD2 and AcyP2.

The N-glycan-dependent contribution of Phe16 to Pin WW stability is −0.19 kcal mol⁻¹ in the absence of Thr21, but is −0.62 kcal mol⁻¹ in the presence of Thr 21. Similarly, the N-glycan-dependent contribution of Thr21 to Pin WW stability is −0.18 kcal mol⁻¹ in the absence of Phe16, but is −0.63 kcal mol⁻¹ in the presence of Phe16.

These results strongly suggest the presence of a stabilizing tripartite interaction between Phe16, Asn19-GlcNAc, and Thr21 and provide evidence that the enhanced aromatic sequon can be successfully applied in reverse turn contexts other than the type I β-bulge turns present in HsCD2ad, RnCD2* and AcyP2*. Significant slowing of the unfolding rate in g-WW-F,T relative to WW-F,T suggests that the Phe-GlcNAc-Thr interaction stabilizes the Pin WW native state. Notably, both the kinetic and thermodynamic data provide strong evidence for the importance of Thr in this reverse turn context. Thr has long been known to play a crucial role in the biology of the OST-mediated glycosylation, but this is the first clear demonstration of its energetic importance.

It is well established that N-glycosylation can enhance glycoprotein stability, but it was not previously possible to know where to put a glycan to achieve predictable stabilization. Our findings indicate that the Asn-N-glycan, Phe and Thr side chains contribute key interactions that significantly stabilize glycoproteins when appropriately placed in reverse turn contexts, even those that are not normally glycosylated. This observation may account in part for the high frequency of glycosylation in the reverse turns of secreted proteins [Petrescu et al., Glycobiology 14, 103-114 (2004); Zielinska et al., Cell 141, 897-907 (2010)].

The results obtained herein are useful for predicting with increased accuracy whether N-glycosylation at a given site is likely to stabilize a protein, and whether that site is likely to be glycosylated efficiently, information that is critical for glycoprotein engineering. That the enhanced aromatic sequon in a type I β-bulge turn context is found in the PDB with all possible aromatic residues at the i position indicates that aromatics other than Phe are useful.

The WW domain from human Pin 1 also conveniently provides a single protein into which several types of enhanced aromatic sequons and their corresponding reverse turn types can be inserted without changing the overall structure or the flanking sequences. The WW domain is ideal for these requirements: many WW variants harboring different reverse turn types in loop 1 have been structurally characterized [Ranganathan et al., Cell 89, 875-886 (1997); Jäger M, et al. Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); and Fuller et al. Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)] and biophysically [Jäger et al. Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); Fuller et al. Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009); Jäger et al., J. Mol. Biol. 311, 373-393 (2001); and Kaul et al., J. Am. Chem. Soc. 123, 5206-5212 (2001)].

Crystal structures exist for WW domains harboring a type II β-turn in a six-residue loop (FIG. 1B), a five-residue type I β-bulge turn (FIG. 10), and a four-residue type I′ β-turn (FIG. 1D) as loop 1. It is thought that a type I′ β-turn, which makes up 11% of the reverse turns in the PDB [Oliva et al., J Mol Biol 266(4), 814-830 (1997)], would also be an additional conformational host for a complementary enhanced aromatic sequon: the Cβ's of the side chains at the i, i+1, and i+3-positions are close enough (<5.6 Å; see FIG. 1D) to support a stabilizing tripartite interaction among Phe, Asn(GlcNAc1), and Thr. Importantly, the chemical synthesis of homogeneously glycosylated [Bertozzi et al., Science 291, 2357-2364 (2001)] WW domains is efficient [Culyba et al., Science 331, 571-575 (2011); and Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010)] enabling numerous analogs to be prepared, each having an identical N-glycan (in this case, GlcNAc1).

The data herein show that type I′ β-turns are suitable conformational hosts for a stabilizing enhanced aromatic sequon. This result significantly expands the scope of protein stabilization by glycosylating enhanced aromatic sequons. Furthermore, these data show that the order of stabilization by glycosylating enhanced aromatic sequons in the different turn types is: type I β-bulge turns>type II β-turns in a six-residue loop>type I′ β-turns.

Because enhanced aromatic sequons in proper turn contexts are stabilizing and may be preferred OST substrates, engineering glycoproteins with these sequences is a useful tool for protein evolution. Thermodynamic stabilization has proven essential for the discovery of mutants with enhanced activity where functional requirements might be at odds with optimal folding energetics. The enhanced aromatic sequon design concepts outlined within should also be immediately applicable to pharmacologic proteins, including antibodies, which could benefit from additional thermodynamic stabilization (and thus increased against proteolysis and aggregation) beyond the numerous other benefits of N-glycosylation such as improved serum half-life; solubility; and lowered immunogenicity [Li et al., Curr Opin Biotechnol 20, 678-684 (2009); Sinclair et al., J Pharm Sci 94, 1626-1635 (2005); Sola et al., BioDrugs 24, 9-21 (2010); Walsh et al., Nat Biotechnol 24, 1241-1252 (2006)].

Using the before noted the ideal platform offered by Pin 1 WW domain loop 1 reverse turn types, four-, five-, and six-residue reverse turns comprising loop 1 of WW were converted to their corresponding enhanced aromatic sequons by replacing the amino acid at position 16 (Ser in all cases) with Phe, replacing the amino acid at position 19 (Asn, Asp, or Ser, respectively) with Asn(GlcNAc1), and replacing the amino acid at position 21 (Arg in all cases) with Thr [Jäger et al. Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); and Jäger et al., J Mol Biol 311, 373-393 (2001)].

Note that the same number is used to indicate amino acid residues in analogous positions in WW variants with different loop 1 lengths [Jäger et al. Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); and Fuller et al. Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)]. Thus, the sequences of the enhanced aromatic sequons in the four-, five-, and six-residue reverse turns comprising loop 1 include Phe16-Asn(GlcNAc1)19-Gly20-Thr21 (SEQ ID NO:137), Phe16-Ala18-Asn(GlcNAc1)19-Gly20-Thr21 (SEQ ID NO:138), and Phe16-Arg17-Ser18-Asn(GlcNAc1)19-Gly20-Thr21 (SEQ ID NO:139), respectively.

The stabilizing effect of glycosylating enhanced aromatic sequons can be quantified by comparing the stabilities of WW variants with glycosylated enhanced aromatic sequons to the stabilities of their non-glycosylated counterparts. The contributions of two- and three-way interactions amongst the Phe16, Asn19(GlcNAc1) and Thr21 side chains to the overall stabilizing effect of glycosylation can be estimated using triple mutant cycle analyses, done previously [Culyba et al., Science 331, 571-575 (2011)]. This parsing of stabilization energies through energetic comparisons was accomplished by replacing Phe16, Asn19(GlcNAc1) and Thr21 by Ser16, Asn19, and Arg21, respectively, in every possible combination, for a total of eight proteins in each of the three correlated enhanced aromatic sequon-reverse turn contexts. The results of these analyses are described hereinafter

The WW variants are named by the number of amino acids in the loop 1 reverse turn, followed by the letter “g” if the variant is N-glycosylated on Asn19, the letter “F” if it has Phe at position 16, and the letter “T” if it has Thr at position 21. The lack of the letters g, F, and/or T indicates that the variant is not N-glycosylated on Asn19, that position 16 is Ser, and/or that position 21 is Arg, respectively. For example, variant 4g-F,T has a 4-residue loop 1 type I′ β-turn, with Asn(GlcNAc1) at position 19, Phe at position 16, and Thr at position 21. Variant 4 has a 4-residue loop 1 type I′ β-turn, with Asn at position 19, Ser at position 16, and Arg at position 21 (see the table hereinafter for the names of the WW variants studied).

Stabilization from Glycosylating Enhanced Aromatic Sequons

To quantify the stabilizing effect of glycosylating enhanced aromatic sequons in loop 1 of the corresponding four-, five-, and six-residue reverse turns, we used variable temperature circular dichroism (CD) spectropolarimetry to analyze the thermodynamic stability of WW variants 4-F,T, 4g-F,T, 5-F,T, 5g-F,T, 6-F,T, and 6g-F,T. CD data for 6-F,T and 6g-F,T and their derivatives (described below) have been published previously at a protein concentration of 50 μM [Culyba et al., Science 331, 571-575 (2011)], but were further studied herein at a protein concentration of 10 μM (the energetic data are comparable at both concentrations) to facilitate direct comparisons with 4-F,T, 4g-F,T, 5-F,T, and 5g-F,T and their derivatives (some of which were not completely soluble at 50 μM).

The table below shows the melting temperature T_(m) and free energy of folding free energies (ΔG_(f)) (at 65° C.) for each protein and corresponding glycoprotein, along with the effect of glycosylation on the T_(m) and ΔG_(f) (at 65° C.) for each protein. A reference temperature of 65° C. was used because it is within

TABLE D WW Variant Sequence^(#) ΔT_(m) ΔG_(f) ΔΔG_(f) [SEQ ID NO] 15   21 T_(m) (° C.) (° C.) (kcal/mol) (kcal/mol) 4 [140 MS--NGR 64.4 ± 0.4 2.2 ± 0.6  0.06 ± 0.04 −0.23 ± 0.06 4g [141] MS-- N GR 66.6 ± 0.4 −0.17 ± 0.04 4-F [142] MF--NGR 66.7 ± 0.5 1.5 ± 0.7 −0.18 ± 0.08 −0.18 ± 0.08 4g-F [143] MF-- N GR 68.2 ± 0.5 −0.36 ± 0.05 4-T [144] MS--NGT 62.2 ± 0.4 −0.8 ± 0.6   0.30 ± 0.04  0.07 ± 0.07 4g-T [145] MS-- N GT 61.4 ± 0.5  0.37 ± 0.05 4-F,T MF--NGT 63.5 ± 0.3 3.2 ± 0.7  0.18 ± 0.03 −0.39 ± 0.09 [146] 4g-F,T MF-- N GT 66.7 ± 0.6 −0.21 ± 0.08 [147] 5 [148] MS-ANGR 68.7 ± 0.2 0.6 ± 0.3 −0.38 ± 0.02 −0.07 ± 0.04 5g [149] MS-A N GR 69.3 ± 0.2 −0.46 ± 0.03 5-F [150] MF-ANGR 65.2 ± 0.3 5.0 ± 0.4 −0.02 ± 0.03 −0.55 ± 0.04 5g-F [151] MF-A N GR 70.3 ± 0.2 −0.58 ± 0.02 5-T [152] MS-ANGT 68.9 ± 0.2 2.4 ± 0.3 −0.42 ± 0.02 −0.23 ± 0.03 5g-T [153] MS-A N GT 71.3 ± 0.3 −0.65 ± 0.03 5-F,T MF-ANGT 66.0 ± 0.2 9.2 ± 0.2 −0.11 ± 0.02 −0.94 ± 0.03 [154] 5g-F,T MF-A N GT 75.2 ± 0.2 −1.05 ± 0.02 [155] 6 [156] MSRSNGR 56.2 ± 0.3 −2.6 ± 0.4   0.95 ± 0.04  0.21 ± 0.06 6g [157] MSRS N GR 53.6 ± 0.3  1.16 ± 0.04 6-F [158] MFRSNGR 51.0 ± 0.3 0.7 ± 0.4  1.45 ± 0.06 −0.17 ± 0.08 6g-F [159] MFRS N GR 51.7 ± 0.3  1.28 ± 0.04 6-T [160] MSRSNGT 52.5 ± 0.3 −0.2 ± 0.5   1.22 ± 0.05  0.04 ± 0.07 6g-T [161] MSRS N GT 52.3 ± 0.3  1.26 ± 0.05 6-F,T MFRSNGT 47.4 ± 0.4 7.6 ± 0.5  1.72 ± 0.09 −0.70 ± 0.10 [162] 6g-F,T MFRS N GT 55.0 ± 0.3  1.02 ± 0.04 [163] *Tabulated data are given as mean ± standard error at 65° C. for WW variants at 10 μM in 20 mM aqueous sodium phosphate, pH 7. ^(#) N  = Asn(glycan). the transition regions of all the variants studied herein. Extrapolating ΔG_(f) to temperatures outside the transition region using thermodynamic parameter estimates from fits to variable temperature CD data is unreliable (because errors in ΔC_(p), the least-well defined parameter from such fits, become magnified outside the transition region). For sets of proteins with similar ΔC_(P) values, the differences between their T_(m) values should reflect the differences between their ΔG_(f) values both at 65° C. and at lower temperatures.

The T_(m) of glycoprotein 4g-F,T is 3.2±0.7° C. higher than that of protein 4-F,T (ΔΔG_(f)=−0.39±0.09 kcal mol⁻¹ at 65° C.), indicating that glycosylating the Phe-Asn-Yyy-Thr (SEQ ID NO:164) enhanced aromatic sequon in the context of a four-residue type I′ β-turn stabilizes WW. Glycosylating the Phe-Xxx-Asn-Yyy-Thr (SEQ ID:136) sequon in the context of the five-residue type I β-turn also stabilizes WW (ΔT_(m)=9.2±0.2° C., ΔΔG_(f)=−0.94±0.03 kcal mol⁻¹ at 65° C.), as does glycosylating the Phe-Xxx-Zzz-Asn-Yyy-Thr (SEQ ID NO:135) sequon in the type II β-turn in a six-residue loop (ΔT_(m)=7.6±0.5° C., ΔΔG_(f)=−0.70±0.10 kcal mol⁻¹ at 65° C.). These data indicate that the Phe-Xxx-Asn-Yyy-Thr (SEQ ID NO:136) enhanced aromatic sequon corresponding to the five-residue type I β-bulge turn is, overall, the best for stabilizing WW amongst those studied here.

Interaction Energies in Enhanced Aromatic Sequons from Triple Mutant Cycle Analysis

To determine whether Phe, Asn(GlcNAc1) and Thr side chains interact similarly in each correlated enhanced aromatic sequon/reverse turn context, the thermodynamic stabilities of each WW variant were measured in the four-, five-, and six-residue reverse turn groups in the table above. The data from each group of eight WW variants comprise a triple mutant cycle (FIG. 5). Triple mutant cycles contain more information than conventional double mutant cycles, because each of the six “faces” of a triple mutant cycle “cube” is itself a double mutant cycle [Horovitz et al., J Mol Biol 224(3), 733-740 (1992)]. Whereas double mutant cycles provide information about the energetic impact of an interaction between two residues, a triple mutant cycle provides information about the energetic impact of the two- and three-way interactions.

Extracting this information from a triple mutant cycle is straightforward, and begins with analyzing the double mutant cycle faces of the triple mutant cycle cube (FIG. 5). The double mutant cycle formed by proteins 4 and 4-F and glycoproteins 4g and 4g-F (the front face of the triple mutant cycle cube in FIG. 5A), reveals that glycosylation of Asn19 (in the presence of Arg21) stabilizes glycoprotein 4g relative to protein 4 (ΔΔG_(f,1)=−0.23±0.06 kcal mol⁻¹ at 65° C.). Similarly, glycosylation of Asn19 (in the presence of Arg 21) stabilizes 4g-F relative to 4g (ΔΔG_(f,2)=−0.18±0.08 kcal mol⁻¹ at 65° C.). The difference between ΔΔG_(f,2) and ΔΔG_(f,1) (AΔΔG_(f,front)=0.05±0.10 kcal mol⁻¹ at 65° C.) indicates that changing Ser16 to Phe16 (while keeping Arg21 constant) does not significantly change the effect of glycosylating Asn19 in the four-residue type I′ β-turn. In other words, Phe16 and Asn(GlcNAc1)19 do not interact favorably in 4g-F.

Changing Arg21 to Thr21 changes this trend. The double mutant cycle formed by proteins 4-T, 4g-T, 4-F,T, and 4g-F,T (the back face of the triple mutant cycle “cube” shown in FIG. 5A) reveals that in the presence of Thr21 (instead of Arg21), Phe16 and Asn(GlcNAc1) interact favorably (ΔΔΔG_(f,back)=−0.46±0.11 kcal mol⁻¹ at 65° C.). The difference between the front and back double mutant cycles is an estimate of the energy of the three-way interaction between Phe16, Asn(GlcNAc1)19, and Thr21. The large difference between ΔΔΔG_(f,back) and ΔΔΔG_(f,front) for the four-residue type I′ β-turn (ΔΔΔΔG_(f)=−0.51±0.15 kcal mol⁻¹ at 65° C.) indicates that Phe16, Asn(GlcNAc1)19 and Thr21 engage in a favorable three-way interaction in 4g-F,T.

Similar analyses of the triple mutant cycles formed by proteins 5 and 6 and their derivatives (FIGS. 5B, 5C) reveal a favorable interaction between Phe16, Asn(GlcNAc1)19, and Thr21 in the five-residue type I β-bulge turn (ΔΔΔΔG_(f)=−0.23±0.07 kcal mol⁻¹ at 65° C.) and in the type II β-turn in a six-residue loop (ΔΔΔΔG_(f)=−0.36±0.15 kcal mol⁻¹ at 65° C.). This three-way interaction between Phe16, Asn(GlcNAc1)19, and Thr21 is similarly favorable in each reverse turn context (perhaps more favorable in the type I′ β-turn than in the type I β-bulge turn, but recall that this is only part of the overall stabilizing effect of N-glycosylation).

The attribution of ΔΔΔG_(f,front) and ΔΔΔG_(f,back) values to the interaction between Phe16 and Asn(GlcNAc1)19, and of ΔΔΔΔG_(f) to the tripartite interaction among Phe16, Asn(GlcNAc1)19, and Thr21, assumes that the Ser16 side chain does not interact with the side chains at positions 19 or 21, and that the Arg21 side chain does not interact with the side chains at positions 16 or 19, in any variant. This assumption is, to a first approximation, consistent with the available structural data. Crystal structures of WW in the context of the full-length Pin1 protein [Ranganathan et al., (1997) Cell 89:875-886; and Jäger et al., (2006) Proc. Natl. Acad. Sci. USA 103:10648-10653] show that the side chains at positions 16, 19, and 21 generally interact only with solvent or the main chain (see, FIG. 1B-1D).

The lone exception is an interaction between the side chain hydroxyl of Ser16 and the side chain carboxylate of Asp19 in the type I β-bulge turn (FIG. 1C). However, the equivalent interaction (between the Ser16 hydroxyl and the Asn19 side chain carbonyl) in the variants of 5 that have Ser at position 16 (5, 5g, 5-T, and 5g-T) should be the same whether Asn19 is N-glycosylated or not, and thus should not affect the analysis.

It is also noted that the reverse turn structures are likely to depend primarily on loop length and the identities of a few key residues (e.g., Asn19 and Gly20 in the variants of 4 and Gly20 in the variants of 5, because these amino acids are strongly favored in these positions of type I′ β-turns and type I β-bulge turns, respectively) [Sibanda et al., J Mol Biol 206(4), 759-777 (1989); and Jäger et al. Proc. Natl. Acad. Sci. USA 103, 10648-10653 (2006). Because these factors are kept constant within the variants that make up each triple mutant cycle, the corresponding reverse turn structures should remain roughly constant as well.

Least-squares regression was used to extract additional information about interactions amongst Phe, Asn(GlcNAc1), and Thr from the triple mutant cycles formed by WW variant groups 4, 5, and 6. The folding free energy data (at 65° C.) from the triple mutant cycle formed by 4 and its derivatives were fit to the following Equation (A):

$\begin{matrix} {{\Delta \; G_{f}} = {{\Delta \; G_{f}^{0}} + {C_{F} \cdot W_{F}} + {C_{\underset{\_}{N}} \cdot W_{\underset{\_}{N}}} + {C_{T} \cdot W_{T}} + {C_{F,\underset{\_}{N}} \cdot W_{F} \cdot W_{\underset{\_}{N}}} + {C_{F,T} \cdot W_{F} \cdot W_{T}} + {C_{\underset{\_}{N},T} \cdot W_{\underset{\_}{N}} \cdot W_{T}} + {C_{F,N,T} \cdot W_{F} \cdot W_{\underset{\_}{N}} \cdot W_{T}}}} & (A) \end{matrix}$

Equation A shows how the ΔG_(f) of a given variant of 4 is related to the average ΔG_(f) ^(o) of 4, plus a series of correction terms that account for the interactions amongst the amino acids at positions 16, 19, and 21. Each correction term is a product of one or more indicator variables W (that reflect whether a mutation is present in the given variant) and a free energy contribution factor C. W_(F) is 0 when position 16 is Ser or 1 when it is Phe; W _(N) is 0 when position 19 is Asn or 1 when it is Asn(GlcNAc1); W_(T) is 0 when position 21 is Arg or 1 when it is Thr. C_(F), C _(N) , and C_(T) describe the energetic consequences of the Ser16 to Phe16, Asn19 to Asn(GlcNAc1)19, and Arg21 to Thr21 mutations, respectively. These energies are thought to reflect the difference in conformational preferences between Ser and Phe at position 16, Asn and Asn(GlcNAc1) at position 19, and Arg and Thr at position 21.

C_(F,N) , C_(F,T), and C _(N,T) describe the free energies of the two-way interactions between Phe16 and Asn(GlcNAc1)19, between Phe16 and Thr21, and between Asn(GlcNAc1)19 and Thr21, respectively. C_(F,N,T) describes the energetic impact of the three-way interaction between Phe16, Asn(GlcNAc1)19, and Thr21. C_(F,N) , C_(F,T), C _(N,T), and C_(F,N,T) are essentially equivalent to the two- and three-way interaction energies (ΔΔΔG_(f) and ΔΔΔΔG_(f) values) that could be calculated by a conventional analysis (e.g., as in the preceding section) of the triple mutant cycle data [Horovitz et al., (1992) J Mol Biol 224(3):733-740], but obtaining them by regression is more convenient, and can provide their standard errors in the regression output. Similar analyses for triple mutant cycle analysis of folding free energy data at 65° C. (338.15 K) for glycosylated and non-glycosylated WW variants harboring either a four-, five-, or six-residue reverse turn in loop 1 were performed and the results are shown in the table below. Note that the caveats

TABLE E Type II β-turn Type I β-bulge in six-residue Type I′ β-turn turn loop ΔG_(f) ° 0.06 ± 0.06 −0.38 ± 0.04 0.95 ± 0.04 (0.287) (0.000) (0.000) C_(F) −0.24 ± 0.08   0.36 ± 0.06 0.50 ± 0.06 (0.005) (0.000) (0.000) C_(N) −0.23 ± 0.08  −0.07 ± 0.06 0.21 ± 0.06 (0.009) (0.248) (0.000) C_(T) 0.23 ± 0.08 −0.04 ± 0.06 0.27 ± 0.06 (0.010) (0.557) (0.000) C_(F,N) 0.05 ± 0.11 −0.48 ± 0.09 −0.38 ± 0.08  (0.661) (0.000) (0.000) C_(F,T) 0.15 ± 0.11 −0.05 ± 0.09 0.00 ± 0.08 (0.168) (0.562) (0.983) C_(N,T) 0.31 ± 0.12 −0.16 ± 0.09 −0.17 ± 0.08  (0.015) (0.088) (0.051) C_(F,N,T) −0.54 ± 0.15  −0.23 ± 0.12 −0.36 ± 0.11  (0.001) (0.078) (0.006) * Parameters are given as mean ± standard error. P values given in parentheses indicate the probability that random sampling error accounts for the difference between zero and the observed value of the parameter. to the conventional analysis of triple mutant cycle data mentioned in the preceding section apply to this analysis as well.

According to Equation A, the stabilizing effect of glycosylating the enhanced aromatic sequon in 4-F,T [ΔΔG_(f)=ΔG_(f)(4g-F,T)-ΔG_(f)(4-F,T)] is equal to the sum of the corresponding values of C _(N) , C_(F,N) , C _(N,T), and C_(F,N,T). The same is true for 5-F,T and 6-F,T. Thus, by comparing C _(N) , C_(F,N) , C _(N,T), and C_(F,N,T) values one can trace the origins of the stabilizing effect of glycosylating the enhanced aromatic sequon in each reverse turn context (FIG. 6).

Changing Asn19 to Asn(GlcNAc1)19 affects each turn type differently: it stabilizes the four-residue type I′ β-turn (C _(N) =−0.23±0.08 kcal mol⁻¹), does not affect the five-residue type I β-bulge turn substantially (C _(N) =−0.07±0.06 kcal mol⁻¹), and destabilizes the type II β-turn within a six-residue loop (C _(N) =0.21±0.06 kcal·mol⁻¹). It is possible that Asn(GlcNAc1) has backbone dihedral angle preferences that are more compatible with the i+1 position of a type I′ β-turn than with the i+2 position of a type Iβ-bulge turn or with the i+3 position of a type II β-turn. If so, such preferences would differ substantially from those of Asn itself [Hovmoller et al., Acta Crystallogr D 58, 768-776 (2002)], which is favored at i+1 in a type I′ β-turn, and at i+2 in a type I β-turn, but not at i+3 in a type II β-turn [Hutchinson et al., Protein Sci 3(12), 2207-2216 (1994)].

The two-way interaction between Phe16 and Asn(GlcNAc1)19 stabilizes the five-residue type I β-bulge turn (C_(F,N) =−0.48±0.09 kcal mol⁻¹) and the type II β-turn within a the six-residue loop (C_(F,N) =−0.38±0.08 kcal mol⁻¹), but does not substantially change the stability of the four-residue type I′ β-turn (C_(F,N) =0.05±0.11 kcal mol⁻¹). These differences appear not to correlate with differences among the Cβ-Cβ distances between positions 16 and 19 in the four-, five-, and six-residue turns (FIG. 1B-D), although it is possible that the backbone flexibility and/or direction of the Cα-Cβ bond vectors in the five- and six-residue turns permit better two-way interactions between Phe16 and Asn(GlcNAc1)19 than are possible in the four-residue turn.

The two-way interaction between Asn(GlcNAc1)19 and Thr21 stabilizes the five- and six-residue turns (C _(N,T)=−0.16±0.09 kcal mol⁻¹ and −0.17±0.08 kcal mol⁻¹, respectively), but substantially destabilizes the four-residue turn (C _(N,T)=0.31±0.12 kcal mol⁻¹). Published structural data [Wyss et al., Science 269, 1273-1278 (1995)] indicate that the glycosylated enhanced aromatic sequon in an analogous type I β-bulge turn in HsCD2ad involves three hydrogen bonds between Thr and Asn(GlcNAc1): one between the Thr side-chain oxygen and the amide proton of the 2-acetamido group of GlcNAc, and two between the Asn side-chain amide carbonyl oxygen and the backbone amide and side-chain hydroxyl protons of Thr (FIG. 1A). The differences observed here between the C _(N,T) values in the four-, five-, and six-residue turn contexts could reflect the presence of analogous hydrogen bonds in the type I β-bulge turn of 5g-F,T and in six-residue loop of 6g-F,T, but not in the type I′ β-turn of 4g-F,T.

The C_(F,N,T) values for the four-residue type I′ β-turn (C_(F,N,T)=−0.54±0.15 kcal mol⁻¹), the five-residue type I β-bulge turn (C_(F,N,T)=−0.23±0.12 kcal mol⁻¹), and the type II β-turn within a six-residue loop (C_(F,N,T)=−0.36±0.11 kcal mol⁻¹) mirror the ΔΔΔΔG_(f) values obtained by comparison of the front and back double mutant cycles in each triple mutant cube in FIG. 5, confirming that the three-way interaction between Phe16, Asn(GlcNAc1)19, and Thr21 stabilizes each reverse turn type by similar amounts.

Discussion

Glycosylating an enhanced aromatic sequon in its correlated reverse turn context is stabilizing. However, the origins of this stabilizing effect differ amongst the enhanced aromatic sequon/reverse turn pairs (FIG. 6).

In the type I′ β-turn, this effect comes predominantly from the three-way interaction between Phe16, Asn(GlcNAc1)19, and Thr21 (C_(F,N,T)) and from the Asn19 to Asn(GlcNAc1)19 mutation (C _(N) ), offset by an unfavorable two-way interaction between Asn(GlcNAc1)19 and Thr21 (C _(N,T)).

In the type I β-bulge turn, the two-way interaction between Phe16 and Asn(GlcNAc1)19 (C_(F,N) ) contributes more than does the three-way interaction between Phe16, Asn(GlcNAc1)19, and Thr21 (C_(F,N,T)). In the type II β-turn within a six-residue loop, the two-way interaction between Phe16 and Asn(GlcNAc1)19 (C_(F,N) ) and the three-way interaction between Phe16, Asn(GlcNAc1)19 and Thr21 (C_(F,N,T)) contribute similar amounts, offset by the unfavorable effect of the Asn19 to Asn(GlcNAc1)19 mutation (C _(N) ). Despite these differences, the results provided here show that each reverse turn type is a suitable host for its corresponding enhanced aromatic sequon.

Adding N-glycans to naïve sites in proteins can be an attractive strategy for increasing their stability. This approach has been used in the development of protein drugs [Walsh et al., Nat Biotechnol 24(10), 1241-1252 (2006); Sinclair et al., J Pharm Sci-Us 94(8), 1626-1635 (2005); Li et al., Curr Opin Biotech 20(6), 678-684 (2009); and Sola et al., Biodrugs 24(1), 9-21 (2010)], where new N-glycans can extend serum half-life [Egrie et al., Exp Hematol 31(4), 290-299 (2003); Su et al., Int J Hematol 91(2), 238-244 (2010); and Ceaglio et al., Biochimie 90(3), 437-449 (2008)] and shelf-life, owing in part to increased protease resistance [Raju et al., Biochem Bioph Res Co 341(3), 797-803 (2006)], decreased aggregation propensity, and compensation for the destabilizing effect of methionine oxidation [Liu et al., Biochemistry 47(18), 5088-5100 (2008)]. Historically, efforts to increase protein stability via N-glycosylation have depended on a trial-and-error approach [Ceaglio et al., Biochimie 90(3), 437-449 (2008); and Elliott et al., J. Biol. Chem. 279, 16854-16862 (2004)], which resulted in unpredictable energetic consequences [Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010); Hackenberger et al., J. Am. Chem. Soc. 127, 12882-12889 (2005); and Chen et al. Proc. Natl. Acad. Sci. USA 107(52), 22528-22533 (2010).

By matching each enhanced aromatic sequon to an appropriate reverse turn conformation, the present invention has provided engineering guidelines by which N-glycosylation can reliably stabilize proteins. These matches include Phe-Asn-Yyy-Thr (SEQ ID NO:164)for type I′ β-turns, Phe-Xxx-Asn-Yyy-Thr (SEQ ID NO:136) for type I β-bulge turns, and Phe-Xxx-Zzz-Asn-Yyy-Thr (SEQ ID NO:135) for type II β-turns within a six-residue loop. Each appears to facilitate native-state stabilizing interactions between Phe, Asn(GlcNAc) and Thr in glycosylation-naïve proteins that have not evolved to optimize protein-carbohydrate interactions [Culyba et al., Science 331, 571-575 (2011)]. The structure-stability relationships unveiled by this work also enable investigators to better predict which glycans can be removed from a glycoprotein to increase crystallization propensity, without yielding an unfolded or destabilized protein.

As noted earlier, the type I β-bulge turn and the type II β-turn in a six-residue loop (in which the Phe-Xxx-Asn-Yyy-Thr (SEQ ID NO:136) and Phe-Xxx-Zzz-Asn-Yyy-Thr (SEQ ID NO:135) sequons were previously applied, respectively) comprise less than 9% of all reverse turns in the PDB [Sibanda et al., J Mol Biol 206(4), 759-777 (1989); and Oliva et al., J Mol Biol 266(4), 814-830 (1997)]. By successfully applying the new Phe-Asn-Yyy-Thr (SEQ ID NO:164) enhanced aromatic sequon to the type I′ β-turn (which comprises nearly 11% of all reverse turns in the PDB), the number of candidate proteins in which enhanced aromatic sequons can be employed without altering the conformation or the number of residues comprising the native reverse turn is doubled [DeGrado et al., Annu Rev Biochem 68, 779-819 (1999); and Gellman, Curr Opin Chem Biol 2(6), 717-725 (1998)].

Materials and Methods

General

Unless otherwise noted, chemicals and products were purchased from Fisher Scientific or Sigma-Aldrich. Phosphate buffered saline (PBS) was prepared from PBS tablets (SIGMA P-4417) and maintained at pH 7.2 with 0.5 mM TCEP and 0.01% sodium azide. 50 mM acetate buffer was prepared from a 4× solution made from 4× solutions of acetic acid (Acros Organic 124040025) and sodium acetate trihydrate (SIGMA 236500) to achieve a final pH of 5.5. Acetate buffer was also prepared with 0.5 mM TCEP and 0.01% sodium azide. All buffer solutions were filtered (Millipore 0.2 μM). Protein was concentrated using Amicon centrifugation devices, MWCO 3 kDa (Millipore). Final concentrations of RnCD2* and AcyP2* variants were determined by evaluation of absorbance at 280 nm using calculated extinction coefficients (ExPASy, ProtParam tool, Swiss Institute of Bioinformatics). All oligonucleotides for site directed mutagenesis were purchased from Integrated DNA Technologies (IDT), 25 nmole DNA oligo normalized to 100 μM in IDTE pH 8.0. Wild type RnCD2 and AcyP2 gene constructs were ordered from IDT as miniGenes in pZErO-2 vectors (Kan resistant).

RnCD2 Amino Acid Sequence

The sequence of wild type RnCD2 used for site directed mutagenesis to produce mutant sequences used:

(SEQ ID NO: 165) HHHHHHENLYFQS DYKDDDDKIEGR ADCRDSGTVW GALGHGINLN    IPNFQMTDDI   DEVRWERGSTLV AEFKRKMKPF    LKSGAFEILA   NG D LKIKNLT RDDSGTYNVTVY  STNGTRILDK   ALDLRILEM           RnCD2.

The first 6 residues are a 6× Histidine-tag, which was included for Nickel affinity chromatography purification. This tag is followed by a 7-residue Tobacco Etch Virus protease cleavage site (TEVs) tag. This tag/protease cleavage site combination is followed by a 9-residue FLAG-tag, which in turn is followed by the 4-residue Factor Xa cleavage site (Xas) that was included so that all of the tags could be removed from the expressed gene construct (which was done before all measurements were taken).

All residues are numbered to correspond to homologous residues in human CD2ad. Thus, the numbering begins with 3; i.e., Ala3, and all following residue numbers increase sequentially. It should also be noted that some sequence changes were made to all mutants to ensure that the protein was only glycosylated at the desired position (Asn65) when expressed in Sf9 cells.

The wild type RnCD2 sequence contains three glycosylation sequons. The asparagines in these positions, Asn72, Asn82, and Asn89, were mutated to glutamine, glutamine, and aspartic acid (underlined), respectively. Finally, to confer glycosylation at Asn65 (bold), Asp67 (bold and underlined) was mutated to threonine:

(SEQ ID NO: 166) HHHHHHENLYFQS DYKDDDDKIEGR ADCRDSGTVW GALGHGINLN    IPNFQMTDDI   DEVRWERGSTLV AEFKRKMKPF    LKSGAFEILA   NGTLKIKELT RDDSGTYEVTVY  STDGTRILDK   ALDLRILEM               RnCD2*.

AcyP2 Amino Acid Sequence

The sequence of wild type AcyP2 used for site directed mutagenesis to produce mutant sequences used:

SEQ ID NO: 173 HHHHHHENLYFQS DYKDDDDKIEGR MSTAQSLKSV DYEVFGRVQG    VCFRMYTEDE   ARKIGVVGWV KNTSK GTVTG    QVQGPEDKVN   SMKSWLSKVG SPSSRIDRTN    FSNEKTISKL   EYSNFSIRY

The same purification/protease site tag used in the RnCD2* variants was used for AcyP2* variants and as with RnCD2* the entire tag was remove via Factor Xa cleavage prior to all studies. Note that the residues are numbered starting with the first residue (Met) after the Factor Xa cleavage site. It should also be noted that some sequence changes were made to all mutants to ensure that the protein was only glycosylated at the desired position (45) when expressed in Sf9 cells. The wild type AcyP2 sequence contains three glycosylation sequons. The serines in these positions, Ser44, Ser82, and Ser96 (underlined), were mutated to alanine. Finally, to confer glycosylation at position 45, Lys45 (bold and underlined) was mutated to asparagine.

SEQ ID NO: 174 HHHHHHENLYFQS DYKDDDDKIEGR MSTAQSLKSV DYEVFGRVQG    VCFRMYTEDE   ARKIGVVGWV KNTANGTVTG    QVQGPEDKVN   SMKSWLSKVG SPSSRIDRTN    FANEKTISKL   EYSNFAIRY

Pin1 Amino Acid Sequence

Peptidyl-prolyl cis-trans isomerase NIMA-interacting 1 (Pin1) is an enzyme (EC 5.2.1.8) that regulates mitosis presumably by interacting with NIMA and attenuating its mitosis-promoting activity. The enzyme displays a preference for an acidic residue N-terminal to the isomerized proline bond. The enzyme catalyzes pSer/Thr-Pro cis/trans isomerizations, and its amino acid residue sequence in single letter code is shown below, from left to right and from N-terminus to C-terminus.

SEQ ID NO: 178 MADEEKLPPG WEKRMSRSSG RVYYFNHITN ASQWERPSGN SSSGGKNGQG EPARVRCSHL LVKHSQSRRP SSWRQEKITR TKEEALELIN GYIQKIKSGE EDFESLASQF SDCSSAKARG DLGAFSRGQM QKPFEDASFA LRTGEMSGPV FTDSGIHIIL RTE

Pin1WW Domain Amino Acid Sequence

Residues 6 through 44 at the N-terminus constitute the WW domain of Pin1 [Ranganathan et al., Cell 89, 875-886 (1997)]. The WW domain sequences used as illustrative herein are from position-6 through position-38. Amino acid residue position changes made to the WW domain are designated with the original amino acid residue position from the N-terminus. The amino acid residue sequences utilized herein are shown in the tables below along with their expected and observed MALDI-TOF [M+H⁺] values.

TABLE F Sequence* SEQ ID Protein 6  10    15    20    25    30    35   39 NO 4 KLPPG WEKRM S--NG RVYYF NHITN ASQFE RPSG 179 4g KLPPG WEKRM S-- N G RVYYF NHITN ASQFE RPSG 180 4-F KLPPG WEKRM F--NG RVYYF NHITN ASQFE RPSG 181 4g-F KLPPG WEKRM F-- N G RVYYF NHITN ASQFE RPSG 182 4-T KLPPG WEKRM S--NG TVYYF NHITN ASQFE RPSG 183 4g-T KLPPG WEKRM S-- N G TVYYF NHITN ASQFE RPSG 184 4-F, T KLPPG WEKRM F--NG TVYYF NHITN ASQFE RPSG 185 4g-F, T KLPPG WEKRM F-- N G TVYYF NHITN ASQFE RPSG 186 5 KLPPG WEKRM S-ANG RVYYF NHITN ASQFE RPSG 187 5g KLPPG WEKRM S-A N G RVYYF NHITN ASQFE RPSG 188 5-F KLPPG WEKRM F-ANG RVYYF NHITN ASQFE RPSG 189 5g-F KLPPG WEKRM F-A N G RVYYF NHITN ASQFE RPSG 190 5-T KLPPG WEKRM S-ANG TVYYF NHITN ASQFE RPSG 191 5g-T KLPPG WEKRM S-A N G TVYYF NHITN ASQFE RPSG 192 5-F, T KLPPG WEKRM F-ANG TVYYF NHITN ASQFE RPSG 193 5g-F, T KLPPG WEKRM F-A N G TVYYF NHITN ASQFE RPSG 194 6 KLPPG WEKRM SRSNG RVYYF NHITN ASQFE RPSG 195 6g KLPPG WEKRM SRS N G RVYYF NHITN ASQFE RPSG 196 6-F KLPPG WEKRM FRSNG RVYYF NHITN ASQFE RPSG 197 6g-F KLPPG WEKRM FRS N G RVYYF NHITN ASQFE RPSG 198 6-T KLPPG WEKRM SRSNG TVYYF NHITN ASQFE RPSG 199 6g-T KLPPG WEKRM SRS N G TVYYF NHITN ASQFE RPSG 200 6-F, T KLPPG WEKRM FRSNG TVYYF NHITN ASQFE RPSG 201 6g-F, T KLPPG WEKRM FRS N G TVYYF NHITN ASQFE RPSG 202 * N = Asn(GlcNAc), Dash = deletion

TABLE G MALDI-TOF [M + H⁺] Expected [M + H⁺] Observed Protein (amu)† [M + H⁺] (amu) 4 3766.9 3766.8 4g 3969.9 3972   4-F 3826.9 3826.4 4g-F 4030.0 4030.5 4-T 3711.8 3711.8 4g-T 3914.9 3916.3 4-F,T 3771.9 3770.8 4g-F,T 3974.9 3975.1 5 3837.9 3837.4 5g 4041.0 4041.7 5-F 3897.9 3898.1 5g-F 4101.0 4101.6 5-T 3782.9 3783.2 5g-T 3985.9 3986.2 5-F,T 3842.9 3842.7 5g-F,T 4046.0 4045.4 6 4010.0 ‡ 6g 4213.1 ‡ 6-F 4070.0 ‡ 6g-F 4273.1 ‡ 6-T 3954.9 ‡ 6g-T 4158.0 ‡ 6-F,T 4015.0 ‡ 6g-F,T 4218.1 ‡ *N = Asn(GlcNAc); †Monoisotopic masses; ‡Determined previously [Culyba et al., Science 331, 571-575 (2011)].

Structural Coordinates

RnCD2 structural coordinates were obtained from the PDB (accession code 1HNG). AcyP2 structural coordinates were obtained from the PDB for horse muscle acylphosphatase (accession code 1APS.pdb), which shares 94% sequence homology with the human protein. Coordinates were manipulated and rendered using PyMOL software (Schrodinger LLC).

Molecular Biology

All PCR was performed using Pfu Turbo® DNA polymerase (Stratagene) using recommended conditions. Restriction enzymes were obtained from New England Biolabs and applied as indicated. DNA fragments were ligated with standard conditions supplied for T4 ligase (Roche). Amplified and digested DNA was purified using 1% agarose (molecular biology grade gel prepared in TAE buffer. DNA isolation/purification steps, including genomic isolation, plasmid isolation, restriction digestion clean-up, and PCR purification were performed with Qiagen kits. Clones were transformed, amplified, and maintained in DH5a E. coli. All clones were verified for accuracy by sequencing.

Protein Purification Steps on FPLC

All FPLC procedures were carried out on an AKTA FPLC from GE Healthcare. HisTrap™ HP columns (1 mL) were run in 25 mM sodium phosphate, 300 mM NaCl, 5-300 mM imidazole, pH 8.0 at a flow rate of 3 mL/minute at room temperature. A Superdex™ 75 10/300 GL column (24 mL) was run in PBS (RnCD2*) or acetate (AcyP2*) at a flow rate of 0.4 mL/minute at room temperature (retention times: RnCD2* with glycan 12.5 minutes, RnCD2* without glycan 12.75 minutes, AcyP2* with glycan 14.75 minutes, AcyP2* without glycan 15 minutes).

Fluorescence Spectrometry

Both RnCD2* and AcyP2* have at least one tryptophan residue buried in the hydrophobic core allowing for an intrinsic fluorescence that depends on the folding status. Fluorescence measurements for RnCD2* and AcyP2 variants were obtained using either a CARY Eclipse (Varian) or an ATF-105 (Aviv) fluorescence spectrometer. Measurements were made in quartz cuvettes, at 25° C., at protein concentrations of 5-30 μg/mL, unless otherwise noted. Fluorescence emission spectra were collected from 315 to 400 nm, following excitation at 280 nm.

CD Spectrometry

CD measurements were made using an Aviv™ 62A DS spectropolarimeter, using quartz cuvettes with path lengths of 0.1 or 1 cm. WW domain solutions were prepared in 20 mM sodium phosphate buffer, pH 7; protein solution concentrations were determined spectroscopically from tyrosine and tryptophan absorbance at 280 nm in 6 M guanidine hydrochloride+20 mM sodium phosphate (ε_(Trp)=5690 M⁻¹cm⁻¹, ε_(Tyr)=1280 M⁻¹cm⁻¹) as described previously [Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010); and Edelhoch Biochemistry 6, 1948-1954 (1967)]. CD spectra were obtained by monitoring molar ellipticity from 340 to 200 nm in 1 nm increments, with 5-second averaging times. Variable temperature CD data were obtained by monitoring molar ellipticity at 227 nm from 0.2 to 98.2° C. at 2° C. intervals, with 90 second equilibration time between data points and 30 second averaging times. The variable temperature CD data were fit to obtain T_(m) and ΔG_(f) values for each protein, as described previously [Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010)], and elsewhere herein.

Preparation of RnCD2* and AcyP2* Variants Construction of Non-Glycosylated Variant Genes

Genes for non-glycosylated versions of RnCD2* and AcyP2* were subcloned into pT7-7 expression vectors using the PIPES method [Klock et al., Methods Mol Biol 498, 91-103 (2009)], to create pHisFLAG-RnCD2b and pHisFLAG-AcyP2b with native sequons removed sequences. The total N- to C-protein coding region is: Met-6His-TEVs-FLAGtag-FXas-RnCD2* or AcyP2*.

Site Directed Mutagenesis:

All mutant variants were engineered from these constructs using quick change site directed mutagenesis.

Expression of Non-Glycosylated Variants in E. coli (Rich Medium)

Bacterial RnCD2* and AcyP2* were expressed as described previously [Hanson et al., Proc Natl Acad Sci USA 106, 3131-3136 (2009)].

Nickel Affinity Purification

Cells were thawed and resuspended in an appropriate purification buffer (RnCD2* variants: 25 mM sodium phosphate, 300 mM NaCl, 5 mM imidazole, 0.5 mM TCEP, pH8.0; AcyP2* variants: same as above with 25 mM TrisHCl in place of phosphate) in 1/20^(th) of the original growth volume. Protease inhibitors (1 tablet/50 mL; Roche EDTA-free) were added. Cells were lysed by sonication. The cell lysate was spun down (15,000 rpm, 30 minutes, 4° C.), the soluble fraction (supernatent) was separated from the insoluble fraction (pellet) and used for Ni-NTA purification. In the case of RnCD2* variants RnCD2*K and RnCD2*KF and AcyP2* variant AcyP2*-F the insoluble fraction was treated with 6 M guanidine hydrochloride (GdnHCl) in the appropriated binding buffer and subjected to for Ni-NTA purification under denaturing conditions (6 M GdnHCl).

Superflow™ Ni-NTA resin was used to affinity purify proteins via the 6×His tag, using conditions described in the Qiagen manual. Denaturing purification was performed similarly with the addition of 6 M GdnHCl to all solutions. Eluted fractions were exchanged into Factor Xa cleavage buffer (50 mM TrisHCl, 100 mM NaCl, pH 7.9) and concentrated in Amicon centrifugation devices.

Factor XA Cleavage of N-Terminal Tags from Non-Glycosylated Proteins

5 mM CaCl₂ was added to concentrated protein in 50 mM TrisHCl, 100 mM NaCl, pH 7.9 before Factor Xa (New England Biolabs) treatment (1 μg Factor Xa: 100 μg of RnCD2 of mAcP). For RnCD2* variants, the protease reaction was carried out at 4° C. for 12 hours. For AcyP2* variants, the protease reaction was carried out at 25° C. for 2 hours. The cleavage mixture was quenched with 100 μM PMSF and separated and buffer exchanged by FPLC (Superdex® 75). RnCD2* final buffer: PBS, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2. AcyP2* final buffer: 50 mM Acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5. HisFLAG-free RnCD2* ESI MS found: 11578; RnCD2*-K ESI MS found: 11576; RnCD2*-F ESI MS found: 11612; RnCD2*-KF ESI MS found: 11611. HisFLAG-free AcyP2* ESI MS found: 11078; AcyP2*-F ESI MS found: 11124.

Glycosylated Variants

Cloning for RnCD2 and AcyP2 into an Insect Shuttle Vector

A 5′ SacI site (gagctc) and 3′ KpnI (ggtacc) site and a preprotrypsin leader sequence (PLS, for excretion into the medium) were designed into both the RnCD2 and AcyP2 genes ordered from IDT. Digestion (SacI and KpnI) and ligation of the products and the insect shuttle vector pFastBac™ (Invitrogen), yielded clone pPLSHisFLAG-RnCD21 and pPLSHisFLAG-AcyP21 (sometimes referred to as RnCD21 and AcyP21, respectively, herein).

Site Directed Mutagenesis

All mutant variants were engineered from these constructs [in the pFastBac™ vector (Invitrogen)] using quick change site directed mutagenesis.

Expression of RnCD2* and AcyP2* in Sf9 (Insect) Cells

Expression in insect cells was carried out as previously described [Hanson et al., Proc Natl Acad Sci U S A 106, 131-3136 (2009)]. After expression, growth medium was collected and 0.2 μM filtered. Protease inhibitors (1 tablet/200 mL; Roche EDTA-free), 0.5 mM TCEP, and 1 mM EDTA were added to the filtered growth media extract.

Ammonium Sulfate Precipitation of Glycosylated Variants

Growth medium was incubated for 1 hour with ammonium sulfate (30% wt/vol) at 4° C. with constant stirring and precipitating species were removed. Addition of more ammonium sulfate (80% total wt/vol) to the soluble fraction for 1 hour at 4° C. resulted in the precipitation of either RnCD2* or AcyP2* variants from the medium. The precipitate was collected with centrifugation followed by vacuum filtration (Whatman Grade 5 qualitative filter paper). Precipitate was stored at −80° C.

Purification of Glycosylated Variants by Nickel Affinity Chromatography

Superflow® Ni-NTA resin (Qiagen) was used to affinity-purify proteins via the 6×His tag, using conditions described in the Qiagen manual. Briefly, precipitated protein was resuspended in ¼ of expression volume of lysis buffer (same as non-glycosylated variants) stirred for 1 hour at 4° C. and 0.2 μM filtered. Filtered medium was applied to a gravity Ni-NTA column in appropriate lysis buffer, and washed with 10 column volumes of lysis buffer and 50 column volumes of washing buffer (18 mM imidazole). Bound protein was removed with 4 column volumes of elution buffer (20 mM TrisHCl, 300 mM imidazole, pH 8.0 for all variants).

Alternatively, an FPLC HisTrap HP column (1 mL) was used for purification with the same buffer conditions as above. Eluted fractions were exchanged into Concanavilin A (ConA) binding buffer (25 mM TrisHCl, 500 mM NaCl, 1 mM MnCl₂, 1 mM CaCl₂, pH 7.4) and 0.5 mM TCEP and concentrated in Amicon centrifugation devices.

Isolation of Glycosylated Protein by Lectin Chromatography

Lectin chromatography with Concanavilin A (ConA) was performed on Nickel column eluate with the ConA Glycoprotein Isolation Kit (Pierce), following the protocols described therein. High mannose and paucimannose species were separated from the non-glycosylated protein found in every expression. Elution and wash fractions that contained only glycosylated protein were pooled and exchanged into Factor Xa cleavage buffer (50 mM TrisHCl, 100 mM NaCl, pH 7.9).

Factor Xa Cleavage of N-Terminal Tags from Glycosylated Proteins

5 mM CaCl₂ was added to concentrated protein in 50 mM TrisHCl, 100 mM NaCl, pH 7.9 before Factor Xa (New England Biolabs) treatment (1 μg Factor Xa: 100 μg of RnCD2* or AcyP2*). For RnCD2* variants the protease reaction was carried out at 4° C. for 12 hours. For AcyP2* variants the protease reaction was carried out at 25° C. for 2 hours. The cleavage mixture was quenched with 100 μM PMSF and separated and buffer exchanged by FPLC (Superdex® 75). RnCD2* variant final buffer: PBS, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2. AcyP2* variant final buffer: 50 mM acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5. If cleavage was incomplete Nickel-NTA resin was used to remove uncleaved protein.

ESI-MS Characterization

Liquid Chromatography Mass Spectrometry (LCMS)

LCMS analysis was performed using an Agilent 1100 LC coupled to an Agilent 1100 single quad ESI mass spectrometer. LC was performed with a 4.6 mm×50 mm ZORBAX 08 column (Agilent Technologies, Inc.).

TABLE MS characterization of glycosylated RnCD2* and AcyP2* variants Variant MW_(expected) MW_(found) % Structure g-RnCD2* 12956 12956 25 Man₆GlcNAc₂ (SEQ ID NO: 203) 13119 13118 44 Man₇GlcNAc₂ 13282 13282 31 Man₈GlcNAc₂ g-RnCD2*-K 12468 12469 6 Man₃GlcNAc₂ (SEQ ID NO: 168) 12792 12793 13 Man₅GlcNAc₂ 12955 12956 22 Man₆GlcNAc₂ 13118 13118 43 Man₇GlcNAc₂ 13281 13280 16 Man₈GlcNAc₂ g-RnCD2*-F 12990 12991 23 Man₆GlcNAc₂ (SEQ ID NO: 170) 13153 13153 54 Man₇GlcNAc₂ 13316 13315 23 Man₈GlcNAc₂ g-RnCD2*-KF 12989 12989 21 Man₆GlcNAc₂ (SEQ ID NO: 172) 13152 13151 33 Man₇GlcNAc₂ 13315 13314 46 Man₈GlcNAc₂ AcyP2* 12117 12116 100 Man₃GlcNAc₂ (SEQ ID NO: 175) (Fuc) AcyP2* 12163 12162 100 Man₃GlcNAc₂ (SEQ ID NO: 175) (Fuc)

Details for the Characterization of RnCD2* Variants Folding Kinetics and Thermodynamics

General

PBS buffer (1×, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2) was made fresh daily from a 10× stock and filtered. Urea and guanidine solutions were prepared fresh daily in 1×PBS, filtered, and concentrations were confirmed my index of refraction (IOR). Subsequent dilutions of urea or guanidine were made with 1×PBS and concentrations were checked by IOR. Constants defined in equations include the universal gas constant (R) and temperature (T). The value of RT at 25° C. was taken to be 0.592 kcal/mol. Data were imported and fitted in Mathematica® 7 software (Wolfram Research). Urea was used, exclusively, as the chaotrope for all RnCD2* variants except for L63F variants. Due to the high thermodynamic stability of the L63F variants and the saturation point of urea at 25° C., all measurements were also taken in guanidine hydrochloride solutions for this mutant (variants g-RnCD2*-F and RnCD2*-F). Further data can be found in Culyba et al., Science 331, 571-575 (2011).

Folding Kinetics of RnCD2* Variants

Fluorescence measurements related to kinetic studies were obtained using an AVIV® ATF-105 stopped-flow fluorimeter for single-mixing studies. The set-up consisted of two syringes (syringe 1:1 mL, syringe 2: 2 mL) that permitted up to a 25-fold dilution of the components of syringe 1 with syringe 2, in a minimum of 80 μL, of which the flow cell holds 40 μL. The dead time between start of mixing and acquisition of data was estimated to be 50-100 ms; in general, only data after the first 200 ms were used for fitting.

Excitation was set at 280 nm (bandwidth: 2 nm) and emission was measured at 330 nm (bandwidth: 8 nm). The photomultiplier voltage was set to 1000 V and data was recorded for 20-200 seconds.

For unfolding studies, the decrease in intensity at 330 nm was monitored after native protein in PBS or low concentrations of urea or guanidine in syringe 1 was mixed with varying volumes of concentrated urea or guanidine solutions in syringe 2. For refolding studies, the increase in intensity at 330 nm was monitored after denatured protein in a urea or guanidine solution in syringe 1 was diluted with varying volumes of PBS buffer or low concentrations of urea or guanidine from syringe 2. All shots of a particular dilution were typically repeated at least 4 times.

Continuous irradiation of RnCD2* at 280 nm led to a decrease in fluorescence intensity over time that correlated with the excitation bandwidth, indicating that photobleaching was taking place. The fluorescence intensity at 330 nm (F₃₃₀) was therefore fit to a double exponential containing a photobleaching (k_(pb)) component and a folding/unfolding component (k_(obs)):

F ₃₃₀ =e ^(−k) ^(ph) ^(t)(c ₁ +c ₂ e ^(−k) ^(obs) )  [1]

where t is time, c₁ is the fluorescence intensity at t=0, and c₂ is the difference in fluorescence between the initial and final states. Note that c₂ was positive in unfolding studies and negative in refolding studies. There was no indication in any of the kinetic studies performed that Eq. 1 was inadequate to describe the observed folding kinetics. Thus, after accounting for photobleaching, folding was a monoexponential process for all variants.

Thermodynamic Stability of RnCD2* Variants Using Chaotrope Denaturation

All fluorescence measurements for equilibrium chaotrope denaturation studies were taken on a CARY Eclipse fluorescence spectrophotometer. The temperature at reading was kept constant at 25° C. using a CARY single cell Peltier accessory (Agilent Technologies).

For equilibrium denaturation studies, solutions of RnCD2* variants were prepared in PBS and high concentration of urea or guanidine (in 1×PBS) at matched protein concentrations (15-20 μg/mL). The solutions were mixed to produce approximately thirty 120 μL samples at regular intervals of urea or guanidine concentrations. Solutions were permitted to equilibrate for at least 30 minutes before fluorescence emission spectra were scanned, the average of three scans was taken.

Global Fit to Kinetic and Equilibrium Data

Plots of the natural logarithm of the observed rate of equilibration between the folded and unfolded states of a protein, in (k_(obs)), vs. denaturant concentration have characteristic V-shapes (hence the term “chevron plot”). The quantity k_(obs) is equal to the sum of the unfolding and folding rate constants, k_(u) and k_(f). Chevron plots therefore result from the dependence of ln k_(u) and ln k_(f) on urea concentration. The unfolding rate constant dominates k_(obs) at high denaturant concentrations, where the chevron plots for several of the RnCD2* variants are slightly curved. Curvature in the unfolding arm of a chevron plot is often attributed to changes in the structure of the folding transition state. This behavior is accounted for by assuming that ln k_(u) has a quadratic dependence on denaturant concentration:

ln k _(u)=ln k _(u,0) +m _(u1) [D]+m _(u2) [D] ²  [2]

where [D] is denaturant concentration, k_(u,0) is the unfolding rate constant at [D]=0, and m_(u1) and m_(u2) are the coefficients of the linear and squared terms in the dependence of ln k_(u) on [D]. The folding rate constant dominates k_(obs) at low denaturant concentrations, where, again, the chevron plots for many of the RnCD2* variants are curved. This has been observed previously by Parker et al. [Parker et al., Biochemistry 36, 13396-13405 (1997)], and was attributed to the rapid formation of an off-pathway intermediate. Thus, the effective folding rate constant, k_(f)*, depends as follows on denaturant concentration:

$\begin{matrix} {k_{f}^{*} = {{f_{u}k_{f}} = \frac{^{{\ln \; k_{f,0}} + {m_{f}{\lbrack D\rbrack}}}}{1 + ^{{\ln \; K_{i,0}} + {m_{i}{\lbrack D\rbrack}}}}}} & \lbrack 3\rbrack \end{matrix}$

where f_(u) is the fraction of not-yet-folded protein that is in the unfolded state (instead of the off-pathway intermediate state; i.e, f_(u)=[U]/([U]+[I])=1/(i+K_(i))), k_(f) is the true folding rate constant at a given denaturant concentration, [D] is denaturant concentration, k_(f,0) is the true folding rate constant at [D]=0, m_(f) is the slope of the dependence of ln k_(f) on [D], K_(i,0) is the equilibrium constant for formation of the off-pathway intermediate at [D]=0, and m_(i) is the slope of the dependence of ln K_(i) on [D]. Summing the expressions for k_(f)* and k_(u) yields an equation for k_(obs):

$\begin{matrix} {{\ln \; k_{obs}} = {{\ln \left( {k_{u} + k_{f}^{*}} \right)} = {\ln \left( {{^{\ln \; k_{u,0}}^{{m_{u\; 1}{\lbrack D\rbrack}} + {m_{u\; 2}{\lbrack D\rbrack}}^{2}}} + \frac{^{{\ln \; k_{f,0}} + {m_{f}{\lbrack D\rbrack}}}}{1 + ^{{\ln \; K_{i,0}} + {m_{i}{\lbrack D\rbrack}}}}} \right)}}} & \lbrack 4\rbrack \end{matrix}$

This equation can be fit to folding kinetics vs. denaturant concentration data to get the parameters of interest (primarily k_(f,0) and k_(u,0)). However, the robustness of the fit can be improved by simultaneously fitting kinetics and equilibrium data. The folding equilibrium constant at a given denaturant concentration (K_(f)) is related to the parameters above as follows:

$\begin{matrix} {K_{f} = {\frac{k_{f}}{k_{u}} = \frac{^{{\ln \; k_{f,0}} + {m_{f}{\lbrack D\rbrack}}}}{^{\ln \; k_{u,0}}^{{m_{u\; 1}{\lbrack D\rbrack}} + {m_{u\; 2}{\lbrack D\rbrack}}^{2}}}}} & \lbrack 5\rbrack \end{matrix}$

This expression can be inserted into the equation for fluorescence-detected equilibrium denaturation:

$\begin{matrix} {F = {{F_{f,0} + {\phi_{f}\lbrack D\rbrack} + \frac{{\Delta \; F} + {{\Delta\phi}\lbrack D\rbrack}}{1 + K_{f}}} = {F_{f,0} + {\phi_{f}\lbrack D\rbrack} + \frac{{\Delta \; F} + {{\Delta\phi}\lbrack D\rbrack}}{1 + \frac{^{{\ln \; k_{f,0}} + {m_{f}{\lbrack D\rbrack}}}}{^{\ln \; k_{u,0}}^{{m_{u\; 1}{\lbrack D\rbrack}} + {m_{u\; 2}{\lbrack D\rbrack}}^{2}}}}}}} & \lbrack 6\rbrack \end{matrix}$

where F is the total fluorescence, F_(f,0) is the fluorescence of the folded protein at [D]=0, φ_(f) is the slope of the fluorescence of the folded state vs. [D], ΔF is the difference in fluorescence between the unfolded and folded states, and Δφ is the difference between the slopes of the fluorescences of the folded and unfolded states vs [D]. Some of the same parameters occur in the models for the dependence on [D] of the folding kinetics and equilibrium. This circumstance enables the simultaneous fitting of kinetic and equilibrium data mentioned above.

To ensure that the kinetic and equilibrium data had equal influence on the parameter estimates, the equilibrium data were weighted as follows: 1) the equilibrium and kinetic data were fit separately to their models; 2) the root mean squared residuals for the two fits were calculated; 3) the ratio of the kinetic and equilibrium RMS residuals was calculated (RMS_(kinetic)/RMS_(equilibrium)); 4) the equilibrium data points were multiplied by this ratio. The combined kinetic and (weighted) equilibrium data sets were then fit simultaneously to the combined kinetic and equilibrium model using Mathematica® 7.0 (Wolfram Research). The fit yielded estimates for k_(f,0) and k_(u,0) which were converted to a folding free energy (ΔG_(f,0)) through the relation:

ΔG _(f,0) =−RT ln K _(f,0) =−RT ln k _(f,0) /k _(u,0)  [7]

The slope of the dependence of ΔG_(f,0) on [D] at [D]=0, m_(eq,0), was determined from the values of m_(f) and m_(u1) through the relation:

m _(eq,0) =−RT(m _(f) −m _(u1))  [8]

Further data from these studies can be found in Culyba et al., Science 331, 571-575 (2011).

Details for the Characterization of AcyP2* Variants Thermodynamics

General

Acetate buffer (50 mM Acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5; Acetate) was made fresh daily from a 4× stock and filtered. Urea solutions were prepared fresh daily in 1× Acetate, filtered, and concentrations were confirmed my index of refraction (IOR). Subsequent dilutions of urea were made with 1× Acetate and concentrations were checked by IOR. Constants defined in equations include the universal gas constant (R) and temperature (T). The value of RT at 25° C. was taken to be 0.592 kcal/mol. Data were imported and fit in Microsoft Excel.

Thermodynamic Stability of AcyP2* Variants Using Chaotrope Denaturation

All fluorescence measurements for equilibrium chaotrope denaturation studies were taken on a CARY Eclipse fluorescence spectrophotometer. The temperature at reading was kept constant at 25° C. using a CARY single cell Peltier accessory (Agilent Technologies). Each chaotrope denaturation study was repeated at least three times for each variant.

For equilibrium denaturation studies, solutions AcyP2* variants were prepared in Acetate and high concentration of urea (in 1× Acetate) at matched concentrations (15-30 μg/mL). The solutions were mixed to produce approximately thirty 120 μL samples at regular intervals of urea or guanidine concentrations. Solutions were permitted to equilibrate for at least 30 minutes before fluorescence emission spectra were scanned, an average of three scans was taken. Like RnCD2*, AcyP2* unfolding in response to increasing concentrations of urea or guanidine causes a shift and intensity change in fluorescence spectrum. Thus, plots of fluorescence intensity at single wave lengths (F_(λ)), versus chaotrope concentration were plotted to demonstrate unfolding. ΔG_(f,0) and m_(eq) values for AcyP2* were estimated by fitting fluorescence intensity at 330 nm (F₃₃₀) vs. urea concentration data to:

$\begin{matrix} {F = {F_{f,0} + {\phi_{f}\lbrack D\rbrack} + \frac{{\Delta \; F} + {{\Delta\phi}\lbrack D\rbrack}}{1 + ^{- {({{\Delta \; G_{f,0}} + {{m_{eq}{\lbrack D\rbrack}}/{RT}}}}}}}} & \lbrack 9\rbrack \end{matrix}$

where F is the total fluorescence, F_(f,0) is the fluorescence of the folded protein at [D]=0, φ_(f) is the slope of the fluorescence of the folded state vs. [D], ΔF is the difference in fluorescence between the unfolded and folded states, and Δφ is the difference between the slopes of the fluorescences of the folded and unfolded states vs [D].

ΔG_(f,0) and m_(eq) values derived from single chaotrope denaturation studies were averaged to give the ΔG_(f,0), and m_(eq) values and fits reported.

Polypeptide Synthesis

General

Pin1 WW domain proteins were synthesized as C-terminal acids, employing a solid phase peptide synthesis approach using a standard Fmoc Nα protecting group strategy either manually (protein WW) or via a combination of manual and automated methods (proteins g-WW (SEQ ID NO:205), WW-F (SEQ ID NO:197), g-WW-F (SEQ ID NO:198), WW-T (SEQ ID NO:199), g-WW-T (SEQ ID NO:200), WW-F,T, (SEQ ID NO:201) and g-WW-F,T (SEQ ID NO:202) were synthesized on an Applied Biosystems 433A automated peptide synthesizer except for the manual coupling of Fmoc-Asn(Ac₃GlcNAc)-OH; as discussed below). See also, Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010).

Amino acids were activated by 2-(1H-benzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate (HBTU, purchased from Advanced ChemTech) and N-hydroxybenzotriazole hydrate (HOBt, purchased from Advanced ChemTech). Fmoc-Gly-loaded NovaSyn® TGT resin and all Fmoc-protected α-amino acids (with acid-labile side-chain protecting groups) were purchased from EMD Biosciences, including the glycosylated amino acid Fmoc-Asn(Ac₃GlcNAc)-OH {N-α-Fmoc-N-[3-[3,4,6-tri-O-acetyl-2-(acetylamino)-deoxy-2-β-glucopyranosyl]-L-asparagine} [Meldal et al., Tetrahedron Lett. 31, 6987-6990 (1990); Otvos et al., Tetrahedron Lett. 31, 5889-5892 (1990)]. Piperidine and N,N-diisopropylethylamine (DIEA) were purchased from Aldrich, N-methylpyrrolidinone (NMP) was purchased from Applied Biosystems, and N,N-dimethylformamide (DMF) was obtained from Fisher.

A general protocol for manual solid phase peptide synthesis follows: Fmoc-Gly-loaded NovaSyn® TGT resin (217 mg, 50 μmol at 0.23 mmol/g resin loading) was aliquotted into a fritted polypropylene syringe and allowed to swell in CH₂Cl₂ and dimethylformamide (DMF). Solvent was drained from the resin using a vacuum manifold. To remove the Fmoc protecting group on the resin-linked amino acid, 2.5 mL of 20% piperidine in DMF was added to the resin, and the resulting mixture was stirred at room temperature for 5 minutes. The deprotection solution was drained from the resin with a vacuum manifold. Then, an additional 2.5 mL of 20% piperidine in DMF was added to the resin, and the resulting mixture was stirred at room temperature for 15 minutes. The deprotection solution was drained from the resin using a vacuum manifold, and the resin was rinsed five times with DMF.

For coupling of an activated amino acid to a newly deprotected amine on resin, the desired Fmoc-protected amino acid (250 μmol, 5 eq.) and HBTU (250 μmol, 5 eq.) were dissolved by vortexing in 2.5 mL 0.1 M HOBt (250 μmol, 5 eq.) in NMP. To the dissolved amino acid solution was added 87.1 μmol DIEA (500 μmol, 10 eq.). Only 1.5 eq. of amino acid were used during the coupling of the expensive Fmoc-Asn(Ac3GlcNAc)-OH monomer, and the required amounts of HBTU, HOBT, and DIEA were adjusted accordingly. The resulting mixture was vortexed briefly and allowed to react for at least 1 minute.

The activated amino acid solution was then added to the resin, and the resulting mixture was stirred at room temperature for at least 1 hour. Selected amino acids were double coupled as needed to allow the coupling reaction to proceed to completion. Following the coupling reaction, the activated amino acid solution was drained from the resin with a vacuum manifold, and the resin was subsequently rinsed five times with DMF. The cycles of deprotection and coupling were alternately repeated to give the desired full-length protein.

Acid-labile side-chain protecting groups were globally removed and proteins were cleaved from the resin by stirring the resin for about 4 hours in a solution of phenol (0.5 g), water (500 μL), thioanisole (500 μL), ethanedithiol (250 μL), and triisopropylsilane (100 μL) in trifluoroacetic acid (TFA, 8 mL). Following the cleavage reaction, the TFA solution was drained from the resin, the resin was rinsed with additional TFA, and the resulting solution was concentrated under Ar. Proteins were precipitated from the concentrated TFA solution by addition of diethyl ether (about 45 mL). Following centrifugation, the ether was decanted, and the pellet (containing the crude protein) was stored at −20° C. until purification.

Acetate protecting groups were subsequently removed from the 3-, 4-, and 6-hydroxyl groups of GlcNAc in Asn(GlcNAc)-containing proteins by hydrazinolysis, as described previously [Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010); and Ficht et al., Chem. Eur. J. 14, 3620-3629 (2008)] and elsewhere herein. The WW domains were purified by reverse-phase HPLC on a 018 column using a linear gradient of water in acetonitrile with 0.2% v/v TFA. The identity of each WW domain was confirmed by matrix-assisted laser desorption/ionization time-of-flight spectrometry (MALDI-TOF), and purity was evaluated by analytical HPLC.

Removal of Acetate Protecting Groups on Asn-Linked GlcNAc Residues in Glycosylated Pin1 WW Domain Proteins

Acetate protecting groups were removed from the 3-, 4-, and 6-hydroxyl groups on the Asn-linked GlcNAc residues in proteins g-WW, g-WW-F, g-WW-T, and g-WW-F,T via hydrazinolysis as described previously [Ficht et al., Chem. Eur. J. 14, 3620-3629 (2008)]. Briefly, the crude protein was dissolved in a solution of 5% hydrazine solution in 60 mM aqueous dithiothreitol (sometimes containing as much as 50% acetonitrile, to facilitate dissolution of the crude protein) and allowed to stand at room temperature for about 1 hour with intermittent agitation. The deprotection reaction was quenched by the addition of about 1 mL TFA and about 20 mL water. The quenched reaction mixture was frozen and lyophilized to give the crude deprotected protein as a white powder.

Purification and Characterization

Immediately prior to purification, the crude proteins were dissolved in either 1:1 water:acetonitrile, DMSO, or 8 M GdnHCl (depending on solubility of the crude protein—8 M GdnHCl was frequently required to dissolve the crude glycosylated proteins even though these proteins were readily soluble in water after purification). Proteins were purified by preparative reverse-phase HPLC on a C18 column using a linear gradient of water in acetonitrile with 0.2% v/v TFA. HPLC fractions containing the desired protein product were pooled, frozen, and lyophilized. Polypeptides were identified by matrix-assisted laser desorption/ionization time-of-flight spectrometry (MALDI-TOF) and purity was established by analytical HPLC. Further data from these studies can be found in Culyba et al., Science 331, 571-575 (2011).

Circular Dichroism Spectroscopy

Measurements were made with an Aviv 62A DS Circular Dichroism Spectrometer, using quartz cuvettes with a 0.1 cm path length. Protein solutions were prepared in 10 mM sodium phosphate buffer, pH 7, and protein concentrations were determined spectroscopically based on tyrosine and tryptophan absorbance at 280 nm in 6 M guanidine hydrochloride+20 mM sodium phosphate (ε_(Trp)=5690 M⁻¹cm⁻¹, ε_(Tyr)=1280 M⁻¹cm⁻¹) [Price et al., J. Am. Chem. Soc. 132, 15359-15367 (2010); and Edelhoch Biochemistry 6, 1948-1954 (1967)]. CD spectra were obtained by monitoring molar ellipticity from, 340 to 200 nm, with 5 second averaging times. Variable temperature CD data were obtained by monitoring molar ellipticity at 227 nm from 0.2 to 98.2° C. at 2° C. intervals, with 90 seconds equilibration time between data points and 30 second averaging times.

Variable temperature CD data were fit to the following model for two-state thermally induced unfolding transitions:

$\begin{matrix} {\lbrack\theta\rbrack = \frac{\left( {D_{0} + {D_{1} \cdot T}} \right) + {K_{f}\left( {N_{0} + {N_{1} \cdot T}} \right)}}{1 + K_{f}}} & (10) \end{matrix}$

where T is temperature in Kelvin, D₀ is the y-intercept and D₁ is the slope of the post-transition baseline; N_(o) is the y-intercept and N₁ is the slope of the pre-transition baseline; and K_(f) is the temperature-dependent folding equilibrium constant. K_(f) is related to the temperature-dependent free energy of folding ΔG_(f)(T) according to the following equation:

$\begin{matrix} {K_{f} = {\exp \left\lbrack \frac{{- \Delta}\; {G_{f}(T)}}{RT} \right\rbrack}} & (11) \end{matrix}$

where R is the universal gas constant (0.0019872 kcal/mol/K). The midpoint of the thermal unfolding transition (or melting temperature T_(m)) was calculated by fitting ΔG_(f)(T) to either of two equations. The first equation is derived from the van't Hoff relationship:

$\begin{matrix} {{\Delta \; {G_{f}(T)}} = {{\frac{\Delta \; {H\left( T_{m} \right)}}{T_{m}}\left( {T_{m} - T} \right)} + {\Delta \; {C_{p}\left\lbrack {T - T_{m} - {T\; {\ln \left( \frac{T}{T_{m}} \right)}}} \right\rbrack}}}} & (12) \end{matrix}$

where ΔH(T_(m)) is the enthalpy of folding at the melting temperature and ΔC_(p) is the heat capacity of folding (ΔH(T_(m)), ΔC_(p), and T_(m) are parameters of the fit). The second equation represents ΔG_(f)(T) as a Taylor series expansion about the melting temperature:

ΔG _(f)(T)=ΔG ₀ +ΔG ₁×(T−T _(m))+ΔG ₂×(T−T _(m))²  (13)

in which ΔG₀, ΔG₁, and ΔG₂ are parameters of the fit and T_(m) is a constant obtained from the van't Hoff fit (in equation 12). The ΔG_(f) values displayed in FIG. 4F for each Pin WW domain protein were obtained by averaging the ΔG_(f) values (calculated at 328.15 K using equation 13) from each of three or more replicate variable temperature CD studies on the same protein.

CD spectra and variable-temperature CD data for proteins Pin WW domain proteins WW, g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and g-WW-F,T appear in the Supplemental Information along with parameters from equations 12 and 13 that were used to fit the variable temperature CD data. The standard error for each fitted parameter is also shown. These standard parameter errors were used to estimate the uncertainty in the average ΔG_(f) values, along with the uncertainty in the folding and unfolding rate ratios shown in FIG. 4F by propagation of error. Further data from these studies can be found in Culyba et al., Science 331, 571-575 (2011).

Laser Temperature Jump Studies

Relaxation times following a rapid laser-induced temperature jump of about 12° C. were measured by monitoring Trp fluorescence of 50 μM solution of Pin WW domain proteins WW, g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and g-WW-F,T in 20 mM sodium phosphate (pH 7) using a nanosecond laser temperature jump apparatus, as described previously [Ballew et al., Rev. Sci. Instrum. 67, 3694-3699 (1996); Ballew et al., Proc. Natl. Acad. Sci. USA 93, 5759-5764 (1996); Ervin et al., J. Photochem. Photobiol. sect. B 54, 1-15 (2000); Jäger et al., J. Mol. Biol. 311, 373-393 (2001)] to monitor the fluorescence decay of a Trp residue in each protein after a laser-induced temperature jump at each of several temperatures.

The relaxation traces represent the average of at least 10 individual temperature-jump studies, and were obtained by fitting the shape f of each fluorescence decay at time t to a linear combination of the fluorescence decay shapes before f₁ and after f₂ the temperature jump:

f(t)=a ₁(t)·+f ₁ +a ₂(t)·f ₂,  (14)

where a₁(t) and a₂(t) are the coefficients of the linear combination describing the relative contributions of f₁ and f₂ to the shape of the fluorescence decay at time t [Jäger et al., J. Mol. Biol. 311, 373-393 (2001)]. Then, the relaxation of the protein to equilibrium at the new temperature following the laser-induced temperature jump can be represented as χ₁(t):

$\begin{matrix} {{{\chi_{1}(t)} = \frac{a_{1}(t)}{{a_{1}(t)} + {a_{2}(t)}}},} & (15) \end{matrix}$

plotted as a function of time for each protein at several temperatures [Ballew et al., Proc. Natl. Acad. Sci. USA, 93, 5759-5764 (1996); and Ervin et al., J. Photochem. Photobiol. sect. B 54, 1-15 (2000)].

The relaxation traces at each temperature were then fit to the following equation:

$\begin{matrix} {{{\chi (t)} = {{{C_{1} \cdot \exp}\left\lfloor \frac{- \left( {t - x_{0}} \right)}{\tau} \right\rfloor} + C_{2}}},} & (16) \end{matrix}$

where C₁ and C₂ are constants describing the amplitude of the fluorescence decay, x₀ is a constant that adjusts the measured time to zero after the instantaneous temperature jump, and τ is the relaxation time, which is the inverse of the observed rate constant k_(obs) (k_(obs)=1/τ). Using the temperature-dependent equilibrium constant K_(f) for each protein (from the variable temperature CD studies), folding k_(f) and unfolding k_(u) rate constants can be extracted from k_(obs) according to the following equations:

$\begin{matrix} {k_{obs} = {k_{f} + k_{u}}} & (17) \\ {K_{f} = \frac{k_{f}}{k_{u}}} & (18) \\ {k_{f} = {k_{obs} \cdot \left\lbrack {1 - \frac{1}{K_{f} + 1}} \right\rbrack}} & (19) \end{matrix}$

The folding rates for each protein can then be fit as a function of temperature to the following Kramers model [Kramers, Physica 7, 284 (1940); Lapidus et al., Proc. Natl. Acad. Sci. USA 97, 7220-7225 (2000); Hanggi et al., Rev. Mod. Phys. 62, 251-341 (1990)] equation:

$\begin{matrix} {{{k_{f}(T)} = {{{v\left( {59{^\circ}\mspace{14mu} {C.}} \right)} \cdot \frac{\eta \left( {59{^\circ}\mspace{14mu} {C.}} \right)}{\eta (T)}}\exp \left\lfloor {- \frac{{\Delta \; G_{0}} + {\Delta \; {G_{1} \cdot \left( {T - T_{m}} \right)}} + {\Delta \; {G_{2} \cdot \left( {T - T_{m}} \right)^{2}}}}{RT}} \right\rfloor}},} & (20) \end{matrix}$

in which the temperature-dependent free energy of activation ΔG^(†) _(f) is represented as a second order Taylor series expansion about the melting temperature T_(m), and ΔG^(†) ₀, ΔG^(†) ₁, and ΔG^(†) ₂ are parameters of the fit. The pre-exponential term in equation S11 represents the viscosity-corrected frequency ν of the characteristic diffusional folding motion at the barrier [Bieri et al., Proc. Natl. Acad. Sci. USA 96, 9597-9601 (1999); Ansari et al., Science 256, 1796-1798 (1992) at 59° C., ν=5×10⁵ s⁻¹) [Fuller et al., Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)]. η(59° C.) is the solvent viscosity at 59° C. and η(T) is the solvent viscosity at temperature T, both calculated with equation 21:

$\begin{matrix} {{\eta (T)} = {A \cdot 10^{\frac{B}{T - C}}}} & (21) \end{matrix}$

where A=2.41×10⁵ Pas, B=247.8 K, and C=140 K [Weast, CRC Handbook of Chemistry and Physics; CRC Press: Boca Raton, 1982].

The parameters for equations 13 and 20 were used to calculate the folding and unfolding rate ratios at 328.15 K for Pin WW domain proteins WW, g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and g-WW-F,T shown in FIG. 4F.

Each of the patents, patent applications and articles cited herein is incorporated by reference. The use of the article “a” or “an” is intended to include one or more.

The foregoing description and the examples are intended as illustrative and are not to be taken as limiting. Still other variations within the spirit and scope of this invention are possible and will readily present themselves to those skilled in the art.

SUPPLEMENTAL INFORMATION SEQUENCES OF ANTIBODY FC PORTIONS FOR PREPARATION OF ENHANCED AROMATIC SEQUONS SEQUENCE PORTIONS TO BE REVISED ARE UNDERLINED >drugbank_drug|DB00078 Ibritumomab - Mouse Anti-CD20 Heavy chain (SEQ ID NO: 206) 1 QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDLYT LSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRGPTIKPCPPCKCPAPNLLGGPSV FIFPPKIKDVLMISLSPIVTCVVVDVSEDDPDVQISWFVNNVEVHTAQTQTHREDYNSTL RVVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTISKPKGSVRAPQVYVLPPPEEEMTK KQVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVER NSYSCSVVHEGLHNHHTTKSFSR >drugbank_drug|DB00078 Ibritumomab - Mouse Anti-CD20 Heavy chain (SEQ ID NO: 207) 2 QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDLYT LSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRGPTIKPCPPCKCPAPNLLGGPSV FIFPPKIKDVLMISLSPIVTCVVVDVSEDDPDVQISWFVNNVEVHTAQTQTHREDYNSTL RVVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTISKPKGSVRAPQVYVLPPPEEEMTK KQVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVER NSYSCSVVHEGLHNHHTTKSFSR >drugbank_drug|DB00078 Ibritumomab - Mouse Anti-CD20 Light chain (SEQ ID NO: 208) 1 QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRADAAPTVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFN >drugbank_drug|DB00078 Ibritumomab - Mouse Anti-CD20 Light chain (SEQ ID NO: 209) 2 QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRADAAPTVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFN >drugbank_drug|DB00028 Immune globulin - IGG1 (SEQ ID NO: 210) PSALTQPPSASGSLGQSVTISCTGTSSDVGGYNYVSWYQQHAGKAPKVIIYEVNKRPSGV PDRFSGSKSGNTASLTVSGLQAEDEADYYCSSYEGSDNFVFGTGTKVTVLGQPKANPTVT LFPPSSEELQANKATEVCLISDFYPGAVTVAWKADGSPVKAGVETTKPSKQSNNKYAASS YLSLTPEQWKSHRSYSCQVTHEGSTVEKTVAPTECSPLVLQESGPGLVKPSEALSLTCTV SGDSINTILYYWSWIRQPPGKGLEWIGYIYYSGSTYGNPSLKSRVTISVNTSKNQFYSKL SSVTAADTAVYYCARVPLVVNPWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGC LVKDYFPQPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNH KPSNTKVDKRVAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPQVKFNWYV DGVQVHNAKTKPREQQYNSTYRVVSVLTVLHQNWLDGKEYKCKVSNKALPAPIEKTISKA KGQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLD SDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSL >drugbank_drug|DB00028 Immune globulin - IgA2 (SEQ ID NO: 211) ELVMTQSPSSLSASVGDRVNIACRASQGISSALAWYQQKPGKAPRLLIYDASNLESGVPS RFSGSGSGTDFTLTISSLQPEDFAIYYCQQFNSYPLTFGGGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGECQVKLLEQSGAEVKKPGASVKVSCKAS GYSFTSYGLHWVRQAPGQRLEWMGWISAGTGNTKYSQKFRGRVTFTRDTSATTAYMGLSS LRPEDTAVYYCARDPYGGGKSEFDYWGQGTLVTVSSASPTSPKVFPLSLDSTPQDGNVVV ACLVQGFFPQEPLSVTWSESGQNVTARNFPPSQDASGDLYTTSSQLTLPATQCPDGKSVT CHVKHYTNPSQDVTVPCPVPPPPPCCHPRLSLHRPALEDLLLGSEANLTCTLTGLRDASG ATFTWTPSSGKSAVQGPPERDLCGCYSVSSVLPGCAQPWNHGETFTCTAAHPELKTPLTA NITKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQGSQELPREKYL TWASRQEPSQGTTTFAVTSILRVAAEDWKKGDTFSCMVGHEALPLAFTQKTIDRLAGKPT HVNVSVVMAEVDGTCY >drugbank_drug|DB00005 Etanercept - DB00005 sequence (SEQ ID NO: 212) LPAQVAFTPYAPEPGSTCRLREYYDQTAQMCCSKCSPGQHAKVECTKTSDTVCDSCEDST YTQLWNWVPECLSCGSRCSSDQVETQACTREQNRICTCRPGWYCALSKQEGCRLCAPLRK CRPGFGVARPGTETSDVVCKPCAPGTFSNTTSSTDICRPHQICNVVAIPGNASMDAVCTS TSPTRSMAPGAVHLPQPVSTRSQHTQPTPEPSTAPSTSFLLPMGPSPPAEGSTGDEPKSC DKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPQVKFNWYVD GVQVHNAKTKPREQQYNSTYRVVSVLTVLHQNWLDGKEYKCKVSNKALPAPIEKTISKAK GQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDS DGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00087 Alemtuzumab - 1CE1: H CAMPATH-1H: Heavy Chain 1 (SEQ ID NO: 213) QVQLQESGPGLVRPSQTLSLTCTVSGFTFTDFYMNWVRQPPGRGLEWIGFIRDKAKGYTT EYNPSVKGRVTMLVDTSKNQFSLRLSSVTAADTAVYYCAREGHTAAPFDYWGQGSLVTVS SASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQS SGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHTCPPCPAPELLG GPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRD ELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00087 Alemtuzumab - 1CE1: H CAMPATH-1H: Heavy Chain 2 (SEQ ID NO: 214) QVQLQESGPGLVRPSQTLSLTCTVSGFTFTDFYMNWVRQPPGRGLEWIGFIRDKAKGYTT EYNPSVKGRVTMLVDTSKNQFSLRLSSVTAADTAVYYCAREGHTAAPFDYWGQGSLVTVS SASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQS SGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHTCPPCPAPELLG GPSVFLEPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRD ELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00087 Alemtuzumab - 1CE1: L CAMPATH-1H: Light Chain 1 (SEQ ID NO: 215) DIQMTQSPSSLSASVGDRVTITCKASQNIDKYLNWYQQKPGKAPKLLIYNTNNLQTGVPS RFSGSGSGTDFTFTISSLQPEDIATYYCLQHISRPRTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00087 Alemtuzumab - 1CE1: L CAMPATH-1H: Light Chain 2 (SEQ ID NO: 216) DIQMTQSPSSLSASVGDRVTITCKASQNIDKYLNWYQQKPGKAPKLLIYNTNNLQTGVPS RFSGSGSGTDFTFTISSLQPEDIATYYCLQHISRPRTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00113 Arcitumomab - 1clo: Anti-CEA antigen light chain 1 (SEQ ID NO: 217) QTVLSQSPAILSASPGEKVTMTCRASSSVTYIHWYQQKPGSSPKSWIYATSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQHWSSKPPTFGGGTKLEIKRADAAPTVSIFPPS SEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB00113 Arcitumomab - 1clo: Anti-CEA antigen light chain 2 (SEQ ID NO: 218) QTVLSQSPAILSASPGEKVTMTCRASSSVTYIHWYQQKPGSSPKSWIYATSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQHWSSKPPTEGGGTKLEIKRADAAPTVSIFPPS SEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB00113 Arcitumomab - 1clo: Anti-CEA heavy chain 1 (SEQ ID NO: 219) EVKLVESGGGLVQPGGSLRLSCATSGFTFTDYYMNWVRQPPGKALEWLGFIGNKANGYTT EYSASVKGRFTISRDKSQSILYLQMNTLRAEDSATYYCTRDRGLRFYFDYWGQGTTLTVS SAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQS DLYTLSSSVTVPSSPRPSETVTCNVAHPASSTKVDKKIVPRDCPPCKCPAPNLLGGPSVF IFPPKIKDVLMISLSPIVTCVVVDVSEDDPDVQISWFVNNVEVHTAQTQTHREDYNSTLR VVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTISKPKGSVRAPQVYVLPPPEEEMTKK QVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERN SYSCSVVHEGLHNHHTTKSFSR >drugbank_drug|DB00113 Arcitumomab - 1clo: Anti-CEA heavy chain 2 (SEQ ID NO: 220) EVKLVESGGGLVQPGGSLRLSCATSGFTFTDYYMNWVRQPPGKALEWLGFIGNKANGYTT EYSASVKGRFTISRDKSQSILYLQMNTLRAEDSATYYCTRDRGLRFYFDYWGQGTTLTVS SAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQS DLYTLSSSVTVPSSPRPSETVTCNVAHPASSTKVDKKIVPRDCPPCKCPAPNLLGGPSVF IFPPKIKDVLMISLSPIVTCVVVDVSEDDPDVQISWFVNNVEVHTAQTQTHREDYNSTLR VVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTISKPKGSVRAPQVYVLPPPEEEMTKK QVTLTCMVTDEMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERN SYSCSVVHEGLHNHHTTKSFSR >drugbank_drug|DB00103 Agalsidase beta - DB00103 sequence (SEQ ID NO: 221) LDNGLARTPTMGWLHWERFMCNLDCQEEPDSCISEKLFMEMAELMVSEGWKDAGYEYLCI DDCWMAPQRDSEGRLQADPQRFPHGIRQLANYVHSKGLKLGIYADVGNKTCAGFPGSFGY YDIDAQTFADWGVDLLKFDGCYCDSLENLADGYKHMSLALNRTGRSIVYSCEWPLYMWPF QKPNYTEIRQYCNHWRNFADIDDSWKSIKSILDWTSFNQERIVDVAGPGGWNDPDMLVIG NFGLSWNQQVTQMALWAIMAAPLFMSNDLRHISPQAKALLQDKDVIAINQDPLGKQGYQL RQGDNFEVWERPLSGLAWAVAMINRQEIGGPRSYTIAVASLGKGVACNPACFITQLLPVK RKLGFYEWTSRLRSHINPTGTVLLQLENTMQMSLKDLL >drugbank_drug|DB00064 Serum albumin iodinated - DB00064 sequence (SEQ ID NO: 222) DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK VHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPA DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL >drugbank_drug|DB00043 Omalizumab - Anti IgE antibody VH domain chain 1 (SEQ ID NO: 223) EVQLVESGGGLVQPGGSLRLSCAVSGYSITSGYSWNWIRQAPGKGLEWVASITYDGSTNY ADSVKGRFTISRDDSKNTFYLQMNSLRAEDTAVYYCARGSHYFGHWHFAVWGQGTLVTVS SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00043 Omalizumab - Anti IgE antibody VH domain chain 2 (SEQ ID NO: 224) EVQLVESGGGLVQPGGSLRLSCAVSGYSITSGYSWNWIRQAPGKGLEWVASITYDGSTNY ADSVKGRFTISRDDSKNTFYLQMNSLRAEDTAVYYCARGSHYFGHWHFAVWGQGTLVTVS SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00043 Omalizumab - Anti IgE antibody VL domain chain 1 (SEQ ID NO: 225) DIQLTQSPSSLSASVGDRVTITCRASQSVDYDGDSYMNWYQQKPGKAPKLLIYAASYLES GVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSHEDPYTFGQGTKVEIKRTVAAPSVF IFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLS STLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00043 Omalizumab - Anti IgE antibody VL domain chain 2 (SEQ ID NO: 226) DIQLTQSPSSLSASVGDRVTITCRASQSVDYDGDSYMNWYQQKPGKAPKLLIYAASYLES GVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSHEDPYTFGQGTKVEIKRTVAAPSVF IFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLS STLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00100 Coagulation Factor IX - DB00100 sequence (SEQ ID NO: 227) YNSGKLEEFVQGNLERECMEEKCSFEEAREVFENTERTTEFWKQYVDGDQCESNPCLNGG SCKDDINSYECWCPFGFEGKNCELDVTCNIKNGRCEQFCKNSADNKVVCSCTEGYRLAEN QKSCEPAVPFPCGRVSVSQTSKLTRAEAVFPDVDYVNSTEAETILDNITQSTQSFNDFTR VVGGEDAKPGQFPWQVVLNGKVDAFCGGSIVNEKWIVTAAHCVETGVKITVVAGEHNIEE TEHTEQKRNVIRIIPHHNYNAAINKYNHDIALLELDEPLVLNSYVTPICIADKEYTNIFL KFGSGYVSGWGRVFHKGRSALVLQYLRVPLVDRATCLRSTKFTIYNNMFCAGFHEGGRDS CQGDSGGPHVTEVEGTSFLTGIISWGEECAMKGKYGIYTKVSRYVNWIKEKTKLT >drugbank_drug|DB00046 Insulin Lyspro recombinant - A chain (SEQ ID NO: 228) GIVEQCCTSICSLYQLENYCN >drugbank_drug|DB00046 Insulin Lyspro recombinant - B chain (SEQ ID NO: 229) FVNQHLCGSHLVEALYLVCGERGFFYTKPT >drugbank_drug|DB00088 Alglucerase - DB00088 sequence (SEQ ID NO: 230) ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANH TGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIR VPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWT SPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGL LSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPE AAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRG MQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNEVDSPIIVDITKDTFYKQPMFYHL GHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFL ETISPGYSIHTYLWRRQ >drugbank_drug|DB00016 Epoetin alfa - DB00016 sequence (SEQ ID NO: 231) APPRLICDSRVLERYLLEAKEAENITTGCAEHCSLNENITVPDTKVNEYAWKRMEVGQQA VEVWQGLALLSEAVLRGQALLVNSSQPWEPLQLHVDKAVSGLRSLTTLLRALGAQKEAIS PPDAASAAPLRTITADTFRKLFRVYSNFLRGKLKLYTGEACRTGDR >drugbank_drug|DB00057 Satumomab Pendetide - Heavy chain 1 B72.3 (murine) (SEQ ID NO: 232) QVQLQQSDAELVKPGASVKISCKASGYTFTDHAIHWAKQKPEQGLEWIGYISPGNDDIKY NEKFKGKATLTADKSSSTAYMQLNSLTSEDSAVYFCKRSYYGHWGQGTTLTVSSASTKGP SVFPLAPCSRSTSESTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLS SVVTVPSSSLGTKTYTCNVDHKPSNTKVDKRVCPPCKCPAPNLLGGPSVFIFPPKIKDVL MISLSPIVTCVVVDVSEDDPDVQISWFVNNVEVHTAQTQTHREDYNSTLRVVSALPIQHQ DWMSGKEFKCKVNNKDLPAPIERTISKPKGSVRAPQVYVLPPPEEEMTKKQVTLTCMVTD EMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERNSYSCSVVHEG LHNHHTTKSFSR >drugbank_drug|DB00057 Satumomab Pendetide - Heavy chain 2 B72.3 (murine) (SEQ ID NO: 233) QVQLQQSDAELVKPGASVKISCKASGYTFTDHAIHWAKQKPEQGLEWIGYISPGNDDIKY NEKFKGKATLTADKSSSTAYMQLNSLTSEDSAVYFCKRSYYGHWGQGTTLTVSSASTKGP SVFPLAPCSRSTSESTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLS SVVTVPSSSLGTKTYTCNVDHKPSNTKVDKRVCPPCKCPAPNLLGGPSVFIFPPKIKDVL MISLSPIVTCVVVDVSEDDPDVQISWFVNNVEVHTAQTQTHREDYNSTLRVVSALPIQHQ DWMSGKEFKCKVNNKDLPAPIERTISKPKGSVRAPQVYVLPPPEEEMTKKQVTLTCMVTD FMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERNSYSCSVVHEG LHNHHTTKSFSR >drugbank_drug|DB00057 Satumomab Pendetide - Light chain 1 B72.3 (murine) (SEQ ID NO: 234) DIQMTQSPASLSVSVGETVTITCRASENTYSNLAWYQQKQGKSPQLLVYAATNLADGVPS RFSGSGSGTQYSLKINSLQSEDFGSYYCQHFWGTPYTFGGGTRLEIKRADAAPTVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFN >drugbank_drug|DB00057 Satumomab Pendetide - Light chain 2 B72.3 (murine) (SEQ ID NO: 235) DIQMTQSPASLSVSVGETVTITCRASENTYSNLAWYQQKQGKSPQLLVYAATNLADGVPS RFSGSGSGTQYSLKINSLQSEDFGSYYCQHFWGTPYTFGGGTRLEIKRADAAPTVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFN >drugbank_drug|DB00045 OspA lipoprotein - DB00045 sequence (SEQ ID NO: 236) CKQNVSSLDEKNSVSVDLPGEMNVLVSKEKNKDGKYDLIATVDKLELKGTSDKNNGSGVL EGVKADKSKVKLTISDDLGQTTLEVFKEDGKTLVSKKVTSKDKSSTEEKFNEKGEVSEKI ITRADGTRLEYTEIKSDGSGKAKEVLKSYVLEGTLTAEKTTLVVKEGTVTLSKNISKSGE VSVELNDTDSSAATKKTAAWNSGTSTLTITVNSKKTKDLVETKENTITVQQYDSNGTKLE GSAVEITKLDEIKNALK >drugbank_drug|DB00068 Interferon beta-1b - DB00068 sequence (SEQ ID NO: 237) MSYNLLGFLQRSSNFQSQKLLWQLNGRLEYCLKDRMNFDIPEEIKQLQQFQKEDAALTIY EMLQNIFAIFRQDSSSTGWNETIVENLLANVYHQINHLKTVLEEKLEKEDFTRGKLMSSL HLKRYYGRILHYLKAKEYSHCAWTIVRVEILRNFYFINRLTGYLRN >drugbank_drug|DB00047 Insulin Glargine recombinant - A chain (SEQ ID NO: 238) GIVEQCCTSICSLYQLENYCG >drugbank_drug|DB00047 Insulin Glargine recombinant - B chain (SEQ ID NO: 239) FVNQHLCGSHLVEALYLVCGERGFFYTPKTRR >drugbank_drug|DB00092 Alefacept - DB00092 sequence (SEQ ID NO: 240) CFSQQIYGVVYGNVTFHVPSNVPLKEVLWKKQKDKVAELENSEFRAFSSFKNRVYLDTVS GSLTIYNLTSSDEDEYEMESPNITDTMKFFLYVLESLPSPTLTCALTNGSIEVQCMIPEH YNSHRGLIMYSWDCPMEQCKRNSTSIYFKMENDLPQKIQCTLSNPLENTTSSIILTTCIP SSGHSRHRYALIPIPLAVITTCIVLYMNGILKCDRKPDRTNSNRVEPKSCDKTHTCPPCP APELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPQVKFNWYVDGVQVHNAKTK PREQQYNSTYRVVSVLTVLHQNWLDGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYT LPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKL TVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00011 Interferon alfa-n1 - DB00011 sequence (SEQ ID NO: 241) CDLPQTHSLGSRRTLMLLAQMRKISLFSCLKDRHDFGFPQEEFGNQFQKAETIPVLHEMI QQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVTETPLMKEDSILAVR KYFQRITLYLKEKKYSPCAWEVVRAEIMRSFSLSTNLQESLRSKE >drugbank_drug|DB00030 Insulin recombinant - A chain (SEQ ID NO: 242) GIVEQCCTSICSLYQLENYCN >drugbank_drug|DB00030 Insulin recombinant - B chain (SEQ ID NO: 243) FVNQHLCGSHLVEALYLVCGERGFFYTPKT >drugbank_drug|DB00023 Asparaginase - DB00023 sequence (SEQ ID NO: 244) QMSLQQELRYIEALSAIVETGQKMLEAGESALDVVTEAVRLLEECPLFNAGIGAVFTRDE THELDACVMDGNTLKAGAVAGVSHLRNPVLAARLVMEQSPHVMMIGEGAENFAFARGMER VSPEIFSTSLRYEQLLAARKEGATVLDHSGAPLDEKQKMGTVGAVALDLDGNLAAATSTG GMTNKLPGRVGDSPLVGAGCYANNASVAVSCTGTGEVFIRALAAYDIAALMDYGGLSLAE ACERVVMEKLPALGGSGGLIAIDHEGNVALPFNTEGMYRAWGYAGDTPTTGIYREKGDTVATQ >drugbank_drug|DB00024 Thyrotropin Alfa - Alpha chain (SEQ ID NO: 245) APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS >drugbank_drug|DB00024 Thyrotropin Alfa - Beta chain (SEQ ID NO: 246) NSCELTNITIAIEKEECRFCISINTTWCAGYCYTRDLVYKDPARPKIQKTCTFKELVYET VRVPGCAHHADSLYTYPVATQCHCGKCDSDSTDCTVRGLGPSYCSFGEMKE >drugbank_drug|DB00082 Pegvisomant - DB00082 sequence (SEQ ID NO: 247) FPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPT PSNREETQQKSNLELLRISLLLIQSWLEPVQFLRSVFANSLVYGASDSNVYDLLKDLEEG IQTLMGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVETFLRIV QCRSVEGSCGF >drugbank_drug|DB00013 Urokinase - DB00013 sequence (SEQ ID NO: 248) KPSSPPEELKFQCGQKTLRPRFKIIGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLMS PCWVISATHCFIDYPKKEDYIVYLGRSRLNSNTQGEMKFEVENLILHKDYSADTLAHHND IALLKIRSKEGRCAQPSRTIQTICLPSMYNDPQFGTSCEITGFGKENSTDYLYPEQLKMT VVKLISHRECQQPHYYGSEVTTKMLCAADPQWKTDSCQGDSGGPLVCSLQGRMTLTGIVS WGRGCALKDKPGVYTRVSHFLPWIRSHTKEENGLAL >drugbank_drug|DB00097 Choriogonadotropin alfa - Alpha chain (SEQ ID NO: 249) APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS >drugbank_drug|DB00097 Choriogonadotropin alfa - Beta chain (SEQ ID NO: 250) SKEPLRPRCRPINATLAVEKEGCPVCITVNTTICAGYCPTMTRVLQGVLPALPQVVCNYR DVRFESIRLPGCPRGVNPVVSYAVALSCQCALCRRSTTDCGGPKDHPLTCDDPRFQDSSS SKAPPPSLPSPSRLPGPSDTPILPQ >drugbank_drug|DB00111 Daclizumab - Humanized Anti-CD25 Heavy Chain 1 (SEQ ID NO: 251) QVQLVQSGAEVKKPGSSVKVSCKASGYTFTSYRMHWVRQAPGQGLEWIGYINPSTGYTEY NQKFKDKATITADESTNTAYMELSSLRSEDTAVYYCARGGGVFDYWGQGTTLTVSSGPSV FPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSV VTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSVFLFPP KPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSV LTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSL TCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSC SVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00111 Daclizumab - Humanized Anti-CD25 Heavy Chain 2 (SEQ ID NO: 252) QVQLVQSGAEVKKPGSSVKVSCKASGYTFTSYRMHWVRQAPGQGLEWIGYINPSTGYTEY NQKFKDKATITADESTNTAYMELSSLRSEDTAVYYCARGGGVFDYWGQGTTLTVSSGPSV FPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSV VTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSVFLFPP KPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSV LTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSL TCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSC SVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00111 Daclizumab - Humanized Anti-CD25 Light Chain 1 (SEQ ID NO: 253) DIQMTQSPSTLSASVGDRVTITCSASSSISYMHWYQQKPGKAPKLLIYTTSNLASGVPAR FSGSGSGTEFTLTISSLQPDDFATYYCHQRSTYPLTFGSGTKVEVKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00111 Daclizumab - Humanized Anti-CD25 Light Chain 2 (SEQ ID NO: 254) DIQMTQSPSTLSASVGDRVTITCSASSSISYMHWYQQKPGKAPKLLIYTTSNLASGVPAR FSGSGSGTEFTLTISSLQPDDFATYYCHQRSTYPLTFGSGTKVEVKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00034 Interferon Alfa-2a, Recombinant - DB00034 sequence (SEQ ID NO: 255) CDLPQTHSLGSRRTLMLLAQMRKISLFSCLKDRHDFGFPQEEFGNQFQKAETIPVLHEMI QQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVTETPLMKEDSILAVR KYFQRITLYLKEKKYSPCAWEVVRAEIMRSFSLSTNLQESLRSKE >drugbank_drug|DB00096 Serum albumin - DB00096 sequence (SEQ ID NO: 256) DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK VHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPA DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL >drugbank_drug|DB00001 Lepirudin - DB00001 sequence (SEQ ID NO: 257) LVYTDCTESGQNLCLCEGSNVCGQGNKCILGSDGEKNQCVTGEGTPKPQSHNDGDFEEIP EEYLQ >drugbank_drug|DB00044 Lutropin alfa - AlphaChain (SEQ ID NO: 258) APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS >drugbank_drug|DB00044 Lutropin alfa - BetaChain (LH) (SEQ ID NO: 259) SREPLRPWCHPINAILAVEKEGCPVCITVNTTICAGYCPTMMRVLQAVLPPLPQVVCTYR DVRFESIRLPGCPRGVDPVVSFPVALSCRCGPCRRSTSDCGGPKDHPLTCDHPQLSGLLFL >drugbank_drug|DB00070 Hyaluronidase - DB00070 sequence (SEQ ID NO: 260) LNFRAPPVIPNVPFLWAWNAPSEFCLGKFDEPLDMSLFSFIGSPRINATGQGVTIFYVDR LGYYPYIDSITGVTVNGGIPQKISLQDHLDKAKKDITFYMPVDNLGMAVIDWEEWRPTWA RNWKPKDVYKNRSIELVQQQNVQLSLTEATEKAKQEFEKAGKDFLVETIKLGKLLRPNHL WGYYLFPDCYNHHYKKPGYNGSCFNVEIKRNDDLSWLWNESTALYPSIYLNTQQSPVAAT LYVRNRVREAIRVSKIPDAKSPLPVFAYTRIVFTDQVLKFLSQDELVYTFGETVALGASG IVIWGTLSIMRSMKSCLLLDNYMETILNPYIINVTLAAKMCSQVLCQEQGVCIRKNWNSS DYLHLNPDNFAIQLEKGGKFTVRGKPTLEDLEQFSEKFYCSCYSTLSCKEKADVKDTDAV DVCIADGVCIDAFLKPPMETEEPQIFYNASPSTLSATMFIVSILFLIISSVASL >drugbank_drug|DB00031 Tenecteplase - DB00031 sequence (SEQ ID NO: 261) SYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHSVPVKSCSEPRCFNGG TCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGNWSTAESGAECTQWNSSA LAQKPYSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDC YFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAK PWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIASHPWQAAAAAKHRRS PGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVH KEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEAL SPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGG PLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP >drugbank_drug|DB00076 Digoxin Immune Fab - 26-10 Heavy chain (murine) (SEQ ID NO: 262) EVQLQQSGPELVKPGASVRMSCKSSGYIFTDFYMNWVRQSHGKSLDYIGYISPYSGVTGY NQKFKGKATLTVDKSSSTAYMELRSLTSEDSAVYYCAGSSGNKWAMDYWGHGASVTVSSA KTTAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEP >drugbank_drug|DB00076 Digoxin Immune Fab - 26-10 Light chain (murine) (SEQ ID NO: 263) DVVMTQTPLSLPVSLGDQASISCRSSQSLVHSNGNTYLNWYLQKAGQSPKLLIYKVSNRF SGVPDRFSGSGSGTDFTLKISRVEAEDLGIYFCSQTTHVPPTFGGGTKLEIKRADAAPTV SIFPPSSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSM SSTLTLTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB04900 Thymalfasin - Thymalfasin (SEQ ID NO: 264) SDAAVDTSSEITTKDLKEKKEVVEEAEN >drugbank_drug|DB00061 Pegademase bovine - DB00061 sequence (SEQ ID NO: 265) AQTPAFNKPRVELHVHLDGAIKPETILYYGRKRGIALPADTPEELQNIIGMDKPLSLPEF LAKFDYYMPAIAGCREAVKRIAYEFVEMKAKDGVVYVEVRYSPHLLANSKVEPIPWNQAE GDLTPDEVVSLVNQGLQEGERDFGVKVRSILCCMRHQPSWSSEVVELCKKYREQTVVAID LAGDETIEGSSLFPGHVKAYAEAVKSGVHRTVHAGEVGSANVVKEAVDTLKTERLGHGYH TLEDATLYNRLRQENMHFEVCPWSSYLTGAWKPDTEHPVVRFKNDQVNYSLNTDDPLIFK STLDTDYQMTKNEMGFTEEEFKRLNINAAKSSFLPEDEKKELLDLLYKAYGMPSPASAEQCL >drugbank_drug|DB00004 Denileukin diftitox - DB00004 sequence (SEQ ID NO: 266) MGADDVVDSSKSFVMENFSSYHGTKPGYVDSIQKGIQKPKSGTQGNYDDDWKGFYSTDNK YDAAGYSVDNENPLSGKAGGVVKVTYPGLTKVLALKVDNAETIKKELGLSLTEPLMEQVG TEEFIKRFGDGASRVVLSLPFAEGSSSVEYINNWEQAKALSVELEINFETRGKRGQDAMY EYMAQACAGNRVRRSVGSSLSCINLDWDVIRDKTKTKIESLKEHGPIKNKMSESPNKTVS EEKAKQYLEEFHQTALEHPELSELKTVTGTNPVFAGANYAAWAVNVAQVIDSETADNLEK TTAALSILPGIGSVMGIADGAVHHNTEEIVAQSIALSSLMVAQAIPLVGELVDIGFAAYN FVESIINLFQVVHNSYNRPAYSPGHKTHAPTSSSTKKTQLQLEHLLLDLQMILNGINNYK NPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVI VLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT >drugbank_drug|DB00042 Botulinum Toxin Type B - Botulinum neurotoxin type B - Clostridium botulinum (SEQ ID NO: 267) MPVTINNFNYNDPIDNNNIIMMEPPFARGTGRYYKAFKITDRIWIIPERYTFGYKPEDFN KSSGIFNRDVCEYYDPDYLNTNDKKNIFLQTMIKLFNRIKSKPLGEKLLEMIINGIPYLG DRRVPLEEFNTNIASVTVNKLISNPGEVERKKGIFANLIIFGPGPVLNENETIDIGIQNH FASREGFGGIMQMKFCPEYVSVFNNVQENKGASIFNRRGYFSDPALILMHELIHVLHGLY GIKVDDLPIVPNEKKFFMQSTDAIQAEELYTFGGQDPSIITPSTDKSIYDKVLQNFRGIV DRLNKVLVCISDPNININIYKNKFKDKYKFVEDSEGKYSIDVESFDKLYKSLMFGFTETN IAENYKIKTRASYFSDSLPPVKIKNLLDNEIYTIEEGFNISDKDMEKEYRGQNKAINKQA YEEISKEHLAVYKIQMCKSVKAPGICIDVDNEDLFFIADKNSFSDDLSKNERIEYNTQSN YIENDFPINELILDTDLISKIELPSENTESLTDFNVDVPVYEKQPAIKKIFTDENTIFQY LYSQTFPLDIRDISLTSSFDDALLFSNKVYSFFSMDYIKTANKVVEAGLFAGWVKQIVND FVIEANKSNTMDKIADISLIVPYIGLALNVGNETAKGNFENAFEIAGASILLEFIPELLI PVVGAFLLESYIDNKNKIIKTIDNALTKRNEKWSDMYGLIVAQWLSTVNTQFYTIKEGMY KALNYQAQALEEIIKYRYNIYSEKEKSNINIDFNDINSKLNEGINQAIDNINNFINGCSV SYLMKKMIPLAVEKLLDFDNTLKKNLLNYIDENKLYLIGSAEYEKSKVNKYLKTIMPFDL SIYTNDTILIEMFNKYNSEILNNIILNLRYKDNNLIDLSGYGAKVEVYDGVELNDKNQFK LTSSANSKIRVTQNQNIIFNSVFLDFSVSFWIRIPKYKNDGIQNYIHNEYTIINCMKNNS GWKISIRGNRIIWTLIDINGKTKSVFFEYNIREDISEYINRWFFVTITNNLNNAKIYING KLESNTDIKDIREVIANGEIIFKLDGDIDRTQFIWMKYFSIFNTELSQSNIEERYKIQSY SEYLKDFWGNPLMYNKEYYMFNAGNKNSYIKLKKDSPVGEILTRSKYNQNSKYINYRDLY IGEKFIIRRKSNSQSINDDIVRKEDYIYLDFFNLNQEWRVYTYKYFKKEEEKLFLAPISD SDEFYNTIQIKEYDEQPTYSCQLLFKKDEESTDEIGLIGIHRFYESGIVFEEYKDYFCIS KWYLKEVKRKPYNLKLGCNWQFIPKDEGWTE >drugbank_drug|DB04899 Nesiritide - DB04899: Natriuretic peptides B (SEQ ID NO: 268) SPKMVQGSGCFGRKMDRISSSSGLGCKVLRRH >drugbank_drug|DB00107 Oxytocin - DB00107 sequence (SEQ ID NO: 269) CYIQNCPLG >drugbank_drug|DB00003 Dornase Alfa - DB00003 sequence (SEQ ID NO: 270) LKIAAFNIQTFGETKMSNATLVSYIVQILSRYDIALVQEVRDSHLTAVGKLLDNLNQDAP DTYHYVVSEPLGRNSYKERYLFVYRPDQVSAVDSYYYDDGCEPCGNDTFNREPAIVRFFS RFTEVREFAIVPLHAAPGDAVAEIDALYDVYLDVQEKWGLEDVMLMGDFNAGCSYVRPSQ WSSIRLWTSPTFQWLIPDSADTTATPTHCAYDRIVVAGMLLRGAVVPDSALPFNFQAAYG LSDQLAQAISDHYPVEVMLK >drugbank_drug|DB00002 Cetuximab - Anti-EGFR heavy chain 1 (SEQ ID NO: 271) QVQLKQSGPGLVQPSQSLSITCTVSGFSLTNYGVHWVRQSPGKGLEWLGVIWSGGNTDYN TPFTSRLSINKDNSKSQVFFKMNSLQSNDTAIYYCARALTYYDYEFAYWGQGTLVTVSAA STKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSG LYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSPKSCDKTHTCPPCPAPELL GGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQ YNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSR DELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKS RWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00002 Cetuximab - Anti-EGFR heavy chain 2 (SEQ ID NO: 272) QVQLKQSGPGLVQPSQSLSITCTVSGFSLTNYGVHWVRQSPGKGLEWLGVIWSGGNTDYN TPFTSRLSINKDNSKSQVFFKMNSLQSNDTAIYYCARALTYYDYEFAYWGQGTLVTVSAA STKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSG LYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSPKSCDKTHTCPPCPAPELL GGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQ YNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSR DELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKS RWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00002 Cetuximab - Anti-EGFR light chain 1 (SEQ ID NO: 273) DILLTQSPVILSVSPGERVSFSCRASQSIGTNIHWYQQRTNGSPRLLIKYASESISGIPS RFSGSGSGTDFTLSINSVESEDIADYYCQQNNNWPTTFGAGTKLELKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGA >drugbank_drug|DB00002 Cetuximab - Anti-EGFR light chain 2 (SEQ ID NO: 274) DILLTQSPVILSVSPGERVSFSCRASQSIGTNIHWYQQRTNGSPRLLIKYASESISGIPS RFSGSGSGTDFTLSINSVESEDIADYYCQQNNNWPTTFGAGTKLELKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGA >drugbank_drug|DB00019 Pegfilgrastim - DB00019 sequence (SEQ ID NO: 275) MTPLGPASSLPQSFLLKCLEQVRKIQGDGAALQEKLCATYKLCHPEELVLLGHSLGIPWA PLSSCPSQALQLAGCLSQLHSGLFLYQGLLQALEGISPELGPTLDTLQLDVADFATTIWQ QMEELGMAPALQPTQGAMPAFASAFQRRAGGVLVASHLQSFLEVSYRVLRHLAQP >drugbank_drug|DB00036 Coagulation factor VIIa - DB00036 sequence (SEQ ID NO: 276) ANAFLEELRPGSLERECKEEQCSFEEAREIFKDAERTKLFWISYSDGDQCASSPCQNGGS CKDQLQSYICFCLPAFEGRNCETHKDDQLICVNENGGCEQYCSDHTGTKRSCRCHEGYSL LADGVSCTPTVEYPCGKIPILEKRNASKPQGRIVGGKVCPKGECPWQVLLLVNGAQLCGG TLINTIWVVSAAHCFDKIKNWRNLIAVLGEHDLSEHDGDEQSRRVAQVIIPSTYVPGTTN HDIALLRLHQPVVLTDHVVPLCLPERTFSERTLAFVRFSLVSGWGQLLDRGATALELMVL NVPRLMTQDCLQQSRKVGDSPNITEYMFCAGYSDGSKDSCKGDSGGPHATHYRGTWYLTG IVSWGQGCATVGHFGVYTRVSQYIEWLQKLMRSEPRPGVLLRAPFP >drugbank_drug|DB00081 Tositumomab - Mouse-Human chimeric Anti-CD20 Heavy Chain 1 (SEQ ID NO: 277) QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00081 Tositumomab - Mouse-Human chimeric Anti-CD20 Heavy Chain 2 (SEQ ID NO: 278) QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00081 Tositumomab - Mouse-Human chimeric Anti-CD20 Light Chain 1 (SEQ ID NO: 279) QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00081 Tositumomab - Mouse-Human chimeric Anti-CD20 Light Chain 2 (SEQ ID NO: 280) QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00062 Human Serum Albumin - DB00062 sequence (SEQ ID NO: 281) DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK VHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPA DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL >drugbank_drug|DB01277 Mecasermin - Mecasermin (SEQ ID NO: 282) GPEILCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMY CAPLKPAKSA >drugbank_drug|DB00041 Aldesleukin - DB00041 sequence (SEQ ID NO: 283) PTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEE ELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRW ITFAQSIISTLT >drugbank_drug|DB00029 Anistreplase - DB00029 sequence (SEQ ID NO: 284) SYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHSVPVKSCSEPRCFNGG TCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGTWSTAESGAECTNWNSSA LAQKPYSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDC YFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAK PWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIASHPWQAAIFAKHRRS PGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVH KEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEAL SPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGG PLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP >drugbank_drug|DB00071 Insulin, porcine - A chain (SEQ ID NO: 285) GIVEQCCTSICSLYQLENYCN >drugbank_drug|DB00071 Insulin, porcine - B chain (SEQ ID NO: 286) FVNQHLCGSHLVEALYLVCGERGFFYTPKT >drugbank_drug|DB00025 Antihemophilic Factor - DB00025 sequence (SEQ ID NO: 287) ATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFNTSVVYKKTLFVEFTDHLFN IAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAVGVSYWKASEGAEYDDQTSQ REKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSHVDLVKDLNSGLIGALLVCR EGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRDAASARAWPKMHTVNGYVNR SLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNHRQASLEISPITFLTAQTLL MDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNEEAEDYDDDLTDSEMDVVRF DDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLAPDDRSYKSQYLNNGPQRIG RKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTLLIIFKNQASRPYNTYPHGI TDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGPTKSDPRCLTRYYSSFVNME RDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDENRSWYLTENIQRFLPNPAG VQLEDPEFQASNIMHSINGYVFDSLQLSVCLHEVAYWYILSIGAQTDFLSVFFSGYTFKH KMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRGMTALLKVSSCDKNTGDYYE DSYEDISAYLLSKNNAIEPRSFSQNSRHPSTRQKQFNATTIPENDIEKTDPWFAHRTPMP KIQNVSSSDLLMLLRQSPTPHGLSLSDLQEAKYETFSDDPSPGAIDSNNSLSEMTHFRPQ LHHSGDMVFTPESGLQLRLNEKLGTTAATELKKLDFKVSSTSNNLISTIPSDNLAAGTDN TSSLGPPSMPVHYDSQLDTTLFGKKSSPLTESGGPLSLSEENNDSKLLESGLMNSQESSW GKNVSSTESGRLFKGKRAHGPALLTKDNALFKVSISLLKTNKTSNNSATNRKTHIDGPSL LIENSPSVWQNILESDTEFKKVTPLIHDRMLMDKNATALRLNHMSNKTTSSKNMEMVQQK KEGPIPPDAQNPDMSFFKMLFLPESARWIQRTHGKNSLNSGQGPSPKQLVSLGPEKSVEG QNFLSEKNKVVVGKGEFTKDVGLKEMVFPSSRNLFLTNLDNLHENNTHNQEKKIQEEIEK KETLIQENVVLPQIHTVTGTKNFMKNLFLLSTRQNVEGSYDGAYAPVLQDFRSLNDSTNR TKKHTAHFSKKGEEENLEGLGNQTKQIVEKYACTTRISPNTSQQNFVTQRSKRALKQFRL PLEETELEKRIIVDDTSTQWSKNMKHLTPSTLTQIDYNEKEKGAITQSPLSDCLTRSHSI PQANRSPLPIAKVSSFPSIRPTYLTRVLFQDNSSHLPAASYRKKDSGVQESSHFLQGAKK NNLSLAILTLEMTGDQREVGSLGTSATNSVTYKKVENTVLPKPDLPKTSGKVELLPKVHI YQKDLFPTETSNGSPGHLDLVEGSLLQGTEGAIKWNEANRPGKVPFLRVATESSAKTPSK LLDPLAWDNHYGTQIPKEEWKSQEKSPEKTAFKKKDTILSLNACESNHAIAAINEGQNKP EIEVTWAKQGRTERLCSQNPPVLKRHQREITRTTLQSDQEEIDYDDTISVEMKKEDFDIY DEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSVPQFKKVVFQEFTD GSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLISYEEDQRQGA EPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHT NTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQMEDPTFKENYRFHA INGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRKKEEYKMALYNLYP GVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGMASGHIRDFQITAS GQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQGARQKFSSLYISQ FIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIARYIRLHPTHYSIRS TLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSKARLHLQGRSNAWR PQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQDGHQWTLFFQNGKV KVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCEAQDLY >drugbank_drug|DB00059 Pegaspargase - DB00059 sequence (SEQ ID NO: 288) QMSLQQELRYIEALSAIVETGQKMLEAGESALDVVTEAVRLLEECPLFNAGIGAVFTRDE THELDACVMDGNTLKAGAVAGVSHLRNPVLAARLVMEQSPHVMMIGEGAENFAFARGMER VSPEIFSTSLRYEQLLAARKEGATVLDHSGAPLDEKQKMGTVGAVALDLDGNLAAATSTG GMTNKLPGRVGDSPLVGAGCYANNASVAVSCTGTGEVFIRALAAYDIAALMDYGGLSLAE ACERVVMEKLPALGGSGGLIAIDHEGNVALPFNTEGMYRAWGYAGDTPTTGIYREKGDTVATQ >drugbank_drug|DB00009 Alteplase - DB00009 sequence (SEQ ID NO: 289) SYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHSVPVKSCSEPRCFNGG TCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGTWSTAESGAECTNWNSSA LAQKPYSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDC YFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAK PWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIASHPWQAAIFAKHRRS PGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVH KEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEAL SPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGG PLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP >drugbank_drug|DB06692 Aprotinin - Aprotinin (bovine pancreatic trypsin inhibitor) (SEQ ID NO: 290) RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA >drugbank_drug|DB00039 Palifermin - DB00039 sequence (SEQ ID NO: 291) YDYMEGGDIRVRRLFCRTQWYLRIDKRGKVKGTQEMKNNYNIMEIRTVAVGIVAIKGVES EFYLAMNKEGKLYAKKECNEDCNFKELILENHYNTYASAKWTHNGGEMFVALNQKGIPVR GKKTKKEQKTAHFLPMAIT >drugbank_drug|DB00072 Trastuzumab - Anti-HER2 Heavy chain 1 (SEQ ID NO: 292) EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRY ADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS GLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPPKSCDKTHTCPPCPAPELLG GPSVFLEPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRD ELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00072 Trastuzumab - Anti-HER2 Heavy chain 2 (SEQ ID NO: 293) EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRY ADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS GLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPPKSCDKTHTCPPCPAPELLG GPSVFLEPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRD ELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00072 Trastuzumab - Anti-HER2 Light chain 1 (SEQ ID NO: 294) DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPS RFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC >drugbank_drug|DB00072 Trastuzumab - Anti-HER2 Light chain 2 (SEQ ID NO: 295) DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPS RFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC >drugbank_drug|DB00075 Muromonab - 1SY6:H OKT3 Heavy Chain 1 (SEQ ID NO: 296) QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNY NQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSA KTTAPSVYPLAPVCGGTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYN STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00075 Muromonab - 1SY6:H OKT3 Heavy Chain 2 (SEQ ID NO: 297) QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNY NQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSA KTTAPSVYPLAPVCGGTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYN STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00075 Muromonab - 1SY6:L OKT3 Light Chain 1 (SEQ ID NO: 298) QIVLTQSPAIMSASPGEKVTMTCSASSSVSYMNWYQQKSGTSPKRWIYDTSKLASGVPAH FRGSGSGTSYSLTISGMEAEDAATYYCQQWSSNPFTFGSGTKLEINRADTAPTVSIFPPS SEQLTSGGASVVCFLNNTYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB00075 Muromonab - 1SY6:L OKT3 Light Chain 2 (SEQ ID NO: 299) QIVLTQSPAIMSASPGEKVTMTCSASSSVSYMNWYQQKSGTSPKRWIYDTSKLASGVPAH FRGSGSGTSYSLTISGMEAEDAATYYCQQWSSNPFTFGSGTKLEINRADTAPTVSIFPPS SEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB00008 Peginterferon alfa-2a - DB00008 sequence (SEQ ID NO: 300) CDLPQTHSLGSRRTLMLLAQMRKISLFSCLKDRHDEGFPQEEFGNQFQKAETIPVLHEMI QQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVTETPLMKEDSILAVR KYFQRITLYLKEKKYSPCAWEVVRAEIMRSFSLSTNLQESLRSKE >drugbank_drug|DB06285 Teriparatide - Parathyroid hormone precursor - Homo sapiens (1-34) (SEQ ID NO: 301) SVSEIQLMHNLGKHLNSMERVEWLRKKLQDVHNF >drugbank_drug|DB00054 Abciximab - 1TXV: H ReoPro-like antibody Heavy Chain 1 (SEQ ID NO: 302) EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYVHWVKQRPEQGLEWIGRIDPANGYTKY DPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCVRPLYDYYAMDYWGQGTSVTVSSA KTTAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYN STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00054 Abciximab - 1TXV: H ReoPro-like antibody Heavy Chain 2 (SEQ ID NO: 303) EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYVHWVKQRPEQGLEWIGRIDPANGYTKY DPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCVRPLYDYYAMDYWGQGTSVTVSSA KTTAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYN STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00054 Abciximab - 1TXV: L ReoPro-like antibody Light Chain 1 (SEQ ID NO: 304) DILMTQSPSSMSVSLGDTVSITCHASQGISSNIGWLQQKPGKSFMGLIYYGTNLVDGVPS RFSGSGSGADYSLTISSLDSEDFADYYCVQYAQLPYTFGGGTKLEIKRADAAPTVSIFPP SSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLT LTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB00054 Abciximab - 1TXV: L ReoPro-like antibody Light Chain 2 (SEQ ID NO: 305) DILMTQSPSSMSVSLGDTVSITCHASQGISSNIGWLQQKPGKSFMGLIYYGTNLVDGVPS RFSGSGSGADYSLTISSLDSEDFADYYCVQYAQLPYTFGGGTKLEIKRADAAPTVSIFPP SSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLT LTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC >drugbank_drug|DB00094 Urofollitropin - Alpha chain (SEQ ID NO: 306) APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS >drugbank_drug|DB00094 Urofollitropin - Beta chain (SEQ ID NO: 307) NSCELTNITIAIEKEECRFCISINTTWCAGYCYTRDLVYKDPARPKIQKTCTFKELVYET VRVPGCAHHADSLYTYPVATQCHCGKCDSDSTDCTVRGLGPSYCSFGEMKE >drugbank_drug|DB00033 Interferon gamma-1b - DB00033 sequence (SEQ ID NO: 308) CYCQDPYVKEAENLKKYFNAGHSDVADNGTLFLGILKNWKEESDRKIMQSQIVSFYFKLF KNFKDDQSIQKSVETIKEDMNVKFFNSNKKKRDDFEKLTNYSVTDLNVQRKAIHELIQVM AELSPAAKTGKRKRSQMLFRGRRASQ >drugbank_drug|DB00026 Anakinra - DB00026 sequence (SEQ ID NO: 309) MRPSGRKSSKMQAFRIWDVNQKTFYLRNNQLVAGYLQGPNVNLEEKIDVVPIEPHALFLG IHGGKMCLSCVKSGDETRLQLEAVNITDLSENRKQDKRFAFIRSDSGPTTSFESAACPGW FLCTAMEADQPVSLTNMPDEGVMVTKFYFQEDE >drugbank_drug|DB00021 Secretin - DB00021 sequence (SEQ ID NO: 310) HSDGTFTSELSRLRDSARLQRLLQGLV >drugbank_drug|DB00006 Bivalirudin - DB00006 sequence (SEQ ID NO: 211) FPRPGGGGNGDFEEIPEEYL >drugbank_drug|DB01285 Corticotropin - ACTH(1-39) (SEQ ID NO: 312) SYSMEHFRWGKPVGKKRRPVKVYPDGAEDQLAEAFPLEF >drugbank_drug|DB00074 Basiliximab - 1MIM: H Anti-CD25 antibody heavy CHIMERIC chain 1 (SEQ ID NO: 313) QLQQSGTVLARPGASVKMSCKASGYSFTRYWMHWIKQRPGQGLEWIGAIYPGNSDTSYNQ KFEGKAKLTAVTSASTAYMELSSLTHEDSAVYYCSRDYGYYFDFWGQGTTLTVSSASTKG PSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSL SSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPPKSCDKTHTCPPCPAPELLGGPSVF LFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYR VVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKN QVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGN VFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00074 Basiliximab - 1MIM: H Anti-CD25 antibody heavy CHIMERIC chain 2 (SEQ ID NO: 314) QLQQSGTVLARPGASVKMSCKASGYSFTRYWMHWIKQRPGQGLEWIGAIYPGNSDTSYNQ KFEGKAKLTAVTSASTAYMELSSLTHEDSAVYYCSRDYGYYFDFWGQGTTLTVSSASTKG PSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSL SSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPPKSCDKTHTCPPCPAPELLGGPSVF LFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYR VVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKN QVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGN VFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00074 Basiliximab - 1MIM: L Anti-CD25 antibody light CHIMERIC chain 1 (SEQ ID NO: 315) QIVSTQSPAIMSASPGEKVTMTCSASSSRSYMQWYQQKPGTSPKRWIYDTSKLASGVPAR FSGSGSGTSYSLTISSMEAEDAATYYCHQRSSYTFGGGTKLEIKRTVAAPSVFIFPPSDE QLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSK ADYEKHKVYACEVTHQGLSSPVTKSFNRGE >drugbank_drug|DB00074 Basiliximab - 1MIM: L Anti-CD25 antibody light CHIMERIC chain 2 (SEQ ID NO: 316) QIVSTQSPAIMSASPGEKVTMTCSASSSRSYMQWYQQKPGTSPKRWIYDTSKLASGVPAR FSGSGSGTSYSLTISSMEAEDAATYYCHQRSSYTFGGGTKLEIKRTVAAPSVFIFPPSDE QLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSK ADYEKHKVYACEVTHQGLSSPVTKSFNRGE >drugbank_drug|DB01276 Exenatide - Exenatide (SEQ ID NO: 317) HGEGTFTSDLSKQMEEEAVRLFIEWLKNGGPSSGAPPPS >drugbank_drug|DB00073 Rituximab - Mouse-Human chimeric Anti-CD20 Heavy Chain 1 (SEQ ID NO: 318) QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00073 Rituximab - Mouse-Human chimeric Anti-CD20 Heavy Chain 2 (SEQ ID NO: 319) QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug|DB00073 Rituximab - Mouse-Human chimeric Anti-CD20 Light Chain 1 (SEQ ID NO: 320) QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00073 Rituximab - Mouse-Human chimeric Anti-CD20 Light Chain 2 (SEQ ID NO: 321) QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR >drugbank_drug|DB00055 Drotrecogin alfa - Heavy chain (SEQ ID NO: 322) LIDGKMTRRGDSPWQVVLLDSKKKLACGAVLIHPSWVLTAAHCMDESKKLLVRLGEYDLR RWEKWELDLDIKEVFVHPNYSKSTTDNDIALLHLAQPATLSQTIVPICLPDSGLAERELN QAGQETLVTGWGYHSSREKEAKRNRTFVLNFIKIPVVPHNECSEVMSNMVSENMLCAGIL GDRQDACEGDSGGPMVASFHGTWFLVGLVSWGEGCGLLHNYGVYTKVSRYLDWIHGHIRD KEAPQKSWAP >drugbank_drug|DB00055 Drotrecogin alfa - Light chain (SEQ ID NO: 323) SKHVDGDQCLVLPLEHPCASLCCGHGTCIXGIGSFSCDCRSGWEGRFCQREVSFLNCSLD NGGCTHYCLEEVGWRRCSCAPGYKLGDDLLQCHPAVKFPCGRPWKRMEKKRSHL SEQUENCES OF NON-ANTIBODY POLYPEPTIDE FOR PREPARATION OF ENHANCED SEQUONS SEQUENCE PORTIONS TO BE REVISED ARE UNDERLINED Alpha chain of Follitropin beta: (SEQ ID NO: 324) 1                                                                    71 APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCCVAKSYNRVTVM 72                  92 GGFKVENHTACHCSTCYYHKS Beta chain of Follitropin beta: (SEQ ID NO: 325) 1                                                                    71 NSCELTNITIAIEKEECRFCISINTTWCAGYCYTRDLVYKDPARPKIQKTCTFKELVYETVRVPGCAHHAD 72                                     111 SLYTYPVATQCHCGKCDSDSTDCTVRGLGPSYCSFGEMKE Imiglucerase: (SEQ ID NO: 326) 1                                                                    71 ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQP 72                                                                  142 EQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDF 143                                                                 213 QLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYF 214                                                                 284 VKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQ 285                                                                 355 RLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLG 356                                                                 426 SWDRGMQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKF 427                                                                 497 IPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ 

1. A chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide, said pre-existing therapeutic polypeptide having a length of about 15 to about 1000 amino acid residues and exhibiting a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7 Å of each other, said preexisting therapeutic polypeptide lacking the sequon within that sequence of four to about seven amino acid residues, in the direction from left to right and from N-terminus to C-terminus, Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser,  (SEQ ID NO:001) wherein Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, P is zero or 1, Zzz is any amino acid residue, Asn is asparagine, Yyy is any amino acid residue other than proline, and Thr/Ser is one or the other of the amino acid residues threonine and serine, said chimeric therapeutic polypeptide having substantially the same length, at least one tight turn and substantially the same sequence as the pre-existing therapeutic polypeptide, the two sequences differing by the presence in the chimeric therapeutic polypeptide of said sequon, Aro-(Xxx)n-(Zzz)p-Asn-Yyy-Thr/Ser  (SEQ ID NO:001), and said sequon being located at the same position in said tight turn as said sequence of four to about seven amino acid residues such that the side chains of the Aro, Asn and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7 Å of each other.
 2. The chimeric polypeptide according to claim 1, wherein said chimeric polypeptide, when said sequon is glycosylated, exhibits a folding stabilization enhanced by about −0.5 to about −4 kcal/mol compared to said pre-existing therapeutic polypeptide in non-glycosylated form.
 3. The chimeric polypeptide according to claim 1, wherein said aromatic amino acid residue is Phe, Trp, Tyr or His.
 4. The chimeric polypeptide according to claim 1, wherein n is
 1. 5. The chimeric polypeptide according to claim 1, wherein Thr/Ser is threonine.
 6. The chimeric polypeptide according to claim 1, where in said pre-existing therapeutic chimeric polypeptide is an antibody.
 7. The chimeric polypeptide according to claim 6, where in said antibody is OKT3.
 8. The chimeric polypeptide according to claim 1, where in said pre-existing therapeutic chimeric polypeptide is a hormone.
 9. The chimeric polypeptide according to claim 8, where in said pre-existing hormone is follicle-stimulating hormone, luteinizing hormone, human choriogonadotropin, or human growth factor, insulin.
 10. The chimeric polypeptide according to claim 1, wherein said pre-existing therapeutic chimeric polypeptide is selected from the group consisting of factor VIII, factor IX, erythropoietin, hepatitis B surface protein, tPA, plasmin, streptokinase, urokinase, thrombin, follicle-stimulating hormone, luteinizing hormone, human choriogonadotropin, IL-2, GM-CSF and IFN-γ.
 11. The chimeric polypeptide according to claim 1 that has a length of about 25 to about 500 amino acid residues.
 12. The chimeric polypeptide according to claim 1, wherein Asn occupies the i+2 position of a five-residue type I β-bulge turn that spans from Aro at the i position to Ser/Thr at the i+4 position.
 13. The chimeric polypeptide according to claim 1, wherein Asn occupies the i+1 position of a five-residue type I′ β-turn that spans from Aro at the i position to Ser/Thr at the i+3.
 14. The chimeric polypeptide according to claim 1, wherein Asn occupies the i+3 position of a six-residue 4:6 hairpin loop turn that spans from Aro at the i position to Ser/Thr at the i+5 position.
 15. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence Lys-(Zzz)m-Aro-(Xxx)n-Zzz-Asn-Yyy-Thr/Ser,  [SEQ ID NO:327] wherein m is zero to three, and Lys is lysine.
 16. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence Aro-Xxx-Zzz-Asn-Yyy-Thr/Ser  (SEQ ID NO:008).
 17. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence Aro-Xxx-Asn-Yyy-Thr/Ser  (SEQ ID NO:002).
 18. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence Aro-Asn-Yyy-Thr/Ser  (SEQ ID NO:003).
 19. The chimeric polypeptide according to claim 1, wherein the Asn of said sequon is glycosylated.
 20. A method of enhancing folded stabilization of a chimeric therapeutic polypeptide compared to a pre-existing therapeutic polypeptide, wherein said pre-existing therapeutic polypeptide comprises a sequence of about 15 to about 1000 amino acid residues and exhibits a secondary structure that comprises at least one tight turn in which the side chains of two residues in a sequence of four to about seven amino acid residues within said tight turn project on the same side of the turn and are within less than about 7 Å of each other, said sequence of four to about seven amino acid residues free of the sequon Aro-(Xxx)n-(Zzz)p-Asn(Glycan)-Yyy-Thr/Ser (SEQ ID NO:005), as defined below, said method comprising the step of: preparing a therapeutic chimeric polypeptide of the same length and substantially same sequence as the therapeutic polypeptide that exhibits a secondary structure comprising at least one tight turn at the same sequence position within the tight turn of said therapeutic polypeptide except that said sequence of four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro-(Xxx)n-(Zzz)p-Asn(Glycan)-Yyy-Thr/Ser  (SEQ ID NO:005), wherein Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or 1, Zzz is any amino acid residue, Asn(Glycan) is glycosylated asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine, and the side chains of the Aro, Asn(Glycan) and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7 Å of each other.
 21. The method according to claim 20, wherein Asn(Glycan) is a 2-(acetylamino)-deoxy-2-β-glucopyranosyl]-L-asparaginyl residue [Asn(GlcNAc)1].
 22. The method according to claim 20, wherein Asn(Glycan) is Asn(GlcNAc)₂.
 23. The method according to claim 20, wherein Asn(Glycan) is Asn(GlcNAc)₂Man₁.
 24. The method according to claim 20, wherein the glycan of Asn(Glycan) is paucimannose.
 25. The method according to claim 20, wherein said therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of said therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid sequence Aro-(Xxx)n-(Zzz)p-Asn(Glycan)-Yyy-Thr/Ser (SEQ ID NO:005) when present in a polypeptide sequence expressed therein.
 26. The method according to claim 20, wherein said therapeutic chimeric polypeptide is prepared by in vitro peptide synthesis.
 27. The method according to claim 26, wherein said in vitro peptide synthesis is by solid phase means.
 28. The method according to claim 20, wherein said sequence of four to about seven amino acid residues within said tight turn of said therapeutic polypeptide are glycosylation-free.
 29. A pharmaceutical composition comprising a pharmaceutically acceptable diluent having dissolved or dispersed therein an effective amount of a therapeutic chimeric polypeptide according to claim 1 in which the Asn of said sequon is glycosylated.
 30. The pharmaceutical composition according to claim 29 wherein said chimeric therapeutic polypeptide is an antibody or a hormone. 